6  Self study - Session 2

In this session, we will explore the dataset in Chapter 4 further.

When comparing our findings in Section 4.1.6.1 and Section 4.1.6.2 with the failure description in Section 4.1.7, we can see that we did a pretty good job in identifying failure reasons. However, we could have done better for OSF.

Exercise 6.1 (Overstrain failure)  

According to Section 4.1.7, the failure type overstrain failure (OSF) occurs when the machine is subjected to excessive torque when a worn-out tool is used. Also the product variant is associated to the failure rate: When the product of torque and tool wear exceeds 11.000 minNm for the L product variant (12.000 for M, 13.000 for H), the process fails due to overstrain.

Are you able to verify this relationship by visualizations?

Also, it seems that the description of TWF in Section 4.1.7 is contradicting our observations.

Exercise 6.2 (Tool wear failure)  

Identify where the description of TWF in Section 4.1.7 does not align with your findings.

So far we have used seaborn to visualize the data. seaborn is based on matplotlib and provides a high-level interface for drawing statistical graphics. While matplotlib supports interactive figures that can zoom, pan and update (see here), for richer interactive visualizations we can use plotly. plotly is open-source and built on top of the Plotly JavaScript library plotly.js. It enables users to create interactive web-based visualizations that can be displayed in Jupyter notebooks as well as in standalone HTML files. Especially for explorative data analysis, plotly offers a more user-friendly and flexible approach to create and explore complex visualizations.

Exercise 6.3 (Introducing Plotly)  

Consult the Plotly documentation for more information on how to use the library. Start with the notebook from Chapter 5, recreate the visualizations from Chapter 4 using plotly, and explore the data interactively.

In Section 4.1.6, we used pair plots, boxplots, and decision trees to investigate the relationship between different features and the machine failure types. Another approach could be to use parallel coordinates plots.

Exercise 6.4 (Parallel coordinates plots)  

Use parallel coordinates plots to visualize the relationships between multiple features and the machine failure types:

  1. Display every available continuous variable along with one selected failure type.
  2. Also add Type as a categorical variable to the plot.
  3. Make the failure type categorical.
  4. Color the lines according to the selected failure type.
  5. Allow for choosing the failure type interactively using a Dropdown widget (see Jupyter widgets) with the available failure types.
  6. The parallel coordinate plot allows for dragging columns (just click and drag the title of a column) and reordering them. It also allows for selecting and highlighting specific lines, making it easier to focus on particular data points. Use these two features to get insights into the relationships between different features and the selected failure type. Can you identify the patterns which we have exploited in our previous analyses?