machinelearninginaction
machinelearninginaction copied to clipboard
Maybe an error in the book
CHAPTER 11 Association analysis with the Apriori algorithm page234: In section 11.3, we quantified an itemset as frequent if it met our minimum support level. We have a similar measurement for association rules. This measurement is called the confidence. The confidence for a rule P ➞ H is defined as support(P | H)/ support(P). Remember, in Python, the | symbol is the set union; the mathematical symbol is U. P | H means all the items in set P or in set H. We calculated the support for all the frequent itemsets in the previous section. Now, when we want to calculate the confidence, all we have to do is call up those support values and do one divide.
Is the paragraph wrong?I think it's contradictory to Listing 11.3 Association rule-generation functions and page226:The confidence is defined for an association rule like {diapers} ➞ {wine}. The confidence for this rule is defined as support({diapers, wine})/support({diapers}). From figure 11.1, the support of {diapers, wine} is 3/5. The support for diapers is 4/5, so the confidence for diapers ➞ wine is 3/4 = 0.75. That means that in 75% of the items in our dataset containing diapers, our rule is correct.
So I think the right thing is as followed: Page234 The confidence for a rule P ➞ H is defined as support (P , H)/support(P). Remember, in Python, the , symbol is the set INTERSECTION( not union); the mathematical symbol is n. P , H means all the items in set P AND(not or) in set H.
Am I right or wrong?