A common and useful application of data mining is finding some groups more likely to do something, such as customers most likely to buy the product or components most likely to fail. Excel’s PivotTable feature makes it remarkably easy to do. You don’t need extra tools. You already have Excel on your computer. You can do data mining with PivotTables.
Here’s an example based on real data. A bank wants to limit the number of their customers who are behind on paying on their loans. This is called going into default. The bank has data on each of their tens of thousands of customers–occupation, education, age, marital status, type of loans and more. In particular, they know who is in default (“yes”) and who isn’t (“no”).
PivotTables allow them to try various demographic categories to find those most likely to be in default. The trick is to code the defaults as 1 and 0 rather than “yes” and “no”. Then the sum and average give answers.
What the bank learned using PivotTables
Here’s what they found after a few quick trials. Those who are widowed or divorced are more likely to go into default. This indicates customers may need help or forbearance after a major life change. The bank also found that entrepreneurs are more than twice as likely to default as the general population, as seen in this PivotTable.
Note how the sum, count and average applied to the 0-1 value of whether or not they defaulted gives the total number of entrepreneurs who defaulted, the total number of entrepreneurs among the customers and, most importantly, the percentage of entrepreneurs who defaulted. At 4.17%, that’s more than twice the 1.68% of the customers overall, and much higher than any other occupation category.
This is an actionable insight for the bank. Don’t believe the hype the entrepreneurs are giving you about how much money they are going to make.