fp-growth
fp-growth copied to clipboard
Items missing in the output
`import pyfpgrowth
item_list = [[11, 13], [4, 12, 13], [4, 13, 14, 17], [7, 11, 13, 14, 17], [2, 4, 13, 14], [13, 14], [2, 7, 11, 13, 14], [4, 13, 14, 15], [6, 11, 13], [11, 13], [11, 13, 14, 17], [7, 11, 13, 14], [2, 11, 12, 13, 14], [7, 11, 13], [11, 13], [7, 8, 13, 14, 17], [0, 7, 11, 13], [2, 11, 13, 14, 15], [7, 11, 12, 13, 14], [11, 12, 13, 14]]
patterns = pyfpgrowth.find_frequent_patterns(item_list, 1)`
My code and dataset is defined as above. The problem is when I try patterns = pyfpgrowth.find_frequent_patterns(item_list[:15], 1), I can find the key (7,) in patterns, but when I use the whole item_list, key(7,) is missing. Is this a bug or I get something wrong with the algorithm?
Thanks.
I'm experiencing a similar issue of missing patterns - so far, only for patterns of length 1. Too much data to post here, but the project can be found here. Here's the issue I came across, though:
I get valid patterns of:
('carrots', 'lettuce', 'garlic'): 10 ('carrots', 'lettuce'): 10 ('carrots', 'garlic'): 13 ('lettuce', 'garlic'): 10
yet there are no single-item patterns for ('carrots',), ('lettuce',), or ('garlic'). Given that any sub-pattern should be at least as frequent as its parent, I would expect all three to show up and with higher counts.
Is there a way to correct this? Thank you for the package and support!
I met the same issue now. @ZaxR , @yanxiang007 Did you find any project can do this job?? Otherwise I want to fixed this bug.
I ended up using Christian Borgelt's PyFIM; however, I still think this implementation is worth fixing/upgrading for a few reasons:
- The association rules option only produces association rules with a single consequent (what I personally needed, but not "complete").
- The implementation above is more intended as a command line program, and figuring out some of the parameter options in python (even with 'help') takes a bit of guesswork (particularly because its compiled from C++) .
- pip install is nice...
Guys, Thanks for your engagement with this. To be perfectly honest, I didn't know what I was doing when I put this project up here. I need to clear a block of time to go through bug reports and pull requests and push tests, fixes and a new release to PyPI.
If anyone wants to help, shout!
Unfortunately, no...
2017-10-24 22:50 GMT+08:00 Billy SU [email protected]:
I met the same issue now. @ZaxR https://github.com/zaxr , @yanxiang007 https://github.com/yanxiang007 Did you find any project can do this job?? Otherwise I want to fixed this bug.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/evandempsey/fp-growth/issues/6#issuecomment-339016893, or mute the thread https://github.com/notifications/unsubscribe-auth/ASRfPt53zOWCaYXkzN7-HzAxrH3XLH8Aks5svfk4gaJpZM4Nscw8 .
Hey guys, I had fixed one of the problem in my own repo, now it can show the "7" itemset which @yanxiang007 mentioned , but it still exists some other bugs.
I am using orange3-associate as an alternate. Hope this information can help you.
Hi All, In working on this issue for my own work I found a problem with the generate_association rules module specifically with the line below
if confidence >= confidence_threshold: rules[antecedent] = (consequent, confidence)
This only allows effectively one rule per item and thus the higher order rules per item where they exist are essentially over riding the simple 1 -> 1 items. I addressed this by changing the code so that instead of setting the dict value each time - if a rule already exists for an item I then append the new rule as a list to the dict value as follows
if confidence >= confidence_threshold: rule1=(consequent, confidence) rule1=list(rule1) if antecedent in rules:
rules[antecedent].append(rule1)
else:
rules[antecedent]=rule1
The rules of course then have to be unravelled in a slightly more complex manner but this worked well . First rule is value[0] -> value[1] . Subsequent rules are stored as list in value[n] so the unravelling is as follows where prel is the antecedent list ,postl is the consequent list and confl is the confidence list
for key,value in rules.iteritems():
if len(value)> 2: #If item has more than 1 rule
for i in range(2,len(value)):# For rule 2 and subsequent rules
prel.append(key)
postl.append(value[i][0])
confl.append(value[i][1])
prel.append(key)
postl.append(value[0])
confl.append(value[1])
hope this helps