fp-growth
fp-growth copied to clipboard
Rules Being Overwritten
Hi All, In working on this issue for my own work I found a problem with the generate_association rules module specifically with the line below
if confidence >= confidence_threshold: rules[antecedent] = (consequent, confidence)
This only allows effectively one rule per item(antecedent) and thus the higher order rules per item where they exist are essentially over riding the simple 1 -> 1 items. I addressed this by changing the code so that instead of setting the rules dict value each time - if a rule already exists for an item(antecendent)- I then append the new rule as a list to the dict value for that item as follows
if confidence >= confidence_threshold: rule1=(consequent, confidence) rule1=list(rule1) if antecedent in rules:
rules[antecedent].append(rule1)
else:
rules[antecedent]=rule1
The rules of course then have to be unravelled in a slightly more complex manner but this worked well . First rule is value[0] -> value[1] . Subsequent rules are stored as list in value[n] so the unravelling is as follows where prel is the antecedent list ,postl is the consequent list and confl is the confidence list
for key,value in rules.iteritems():
if len(value)> 2: #If item has more than 1 rule
for i in range(2,len(value)):# For rule 2 and subsequent rules
prel.append(key)
postl.append(value[i][0])
confl.append(value[i][1])
prel.append(key)
postl.append(value[0])
confl.append(value[1])
I'm sure there's a neater way of resolving this issue but it is an important constraint on the scope of the algorithm . The algorithm is super fast and excellent otherwise and a pity if not being used for this reason . Comments welcome - would be great if the code could be updated.
@Rbain2 I had the same problem and it seems nothing's changed since your post. Could you make a pull request with your change? Author seems to not have time for development of this package, but maybe he could accept this correction.