fp-growth Items missing in the output

`import pyfpgrowth

item_list = [[11, 13], [4, 12, 13], [4, 13, 14, 17], [7, 11, 13, 14, 17], [2, 4, 13, 14], [13, 14], [2, 7, 11, 13, 14], [4, 13, 14, 15], [6, 11, 13], [11, 13], [11, 13, 14, 17], [7, 11, 13, 14], [2, 11, 12, 13, 14], [7, 11, 13], [11, 13], [7, 8, 13, 14, 17], [0, 7, 11, 13], [2, 11, 13, 14, 15], [7, 11, 12, 13, 14], [11, 12, 13, 14]]

patterns = pyfpgrowth.find_frequent_patterns(item_list, 1)`

My code and dataset is defined as above. The problem is when I try patterns = pyfpgrowth.find_frequent_patterns(item_list[:15], 1), I can find the key (7,) in patterns, but when I use the whole item_list, key(7,) is missing. Is this a bug or I get something wrong with the algorithm?

Thanks.

Jun 01 '17 03:06 yanxiang007

I'm experiencing a similar issue of missing patterns - so far, only for patterns of length 1. Too much data to post here, but the project can be found here. Here's the issue I came across, though:

I get valid patterns of:

('carrots', 'lettuce', 'garlic'): 10 ('carrots', 'lettuce'): 10 ('carrots', 'garlic'): 13 ('lettuce', 'garlic'): 10

yet there are no single-item patterns for ('carrots',), ('lettuce',), or ('garlic'). Given that any sub-pattern should be at least as frequent as its parent, I would expect all three to show up and with higher counts.

Is there a way to correct this? Thank you for the package and support!

Oct 11 '17 15:10 ZaxR

I met the same issue now. @ZaxR , @yanxiang007 Did you find any project can do this job?? Otherwise I want to fixed this bug.

Oct 24 '17 14:10 Billy4195

I ended up using Christian Borgelt's PyFIM; however, I still think this implementation is worth fixing/upgrading for a few reasons:

The association rules option only produces association rules with a single consequent (what I personally needed, but not "complete").
The implementation above is more intended as a command line program, and figuring out some of the parameter options in python (even with 'help') takes a bit of guesswork (particularly because its compiled from C++) .
pip install is nice...

Oct 25 '17 14:10 ZaxR

Guys, Thanks for your engagement with this. To be perfectly honest, I didn't know what I was doing when I put this project up here. I need to clear a block of time to go through bug reports and pull requests and push tests, fixes and a new release to PyPI.

If anyone wants to help, shout!

Oct 25 '17 20:10 evandempsey

Unfortunately, no...

2017-10-24 22:50 GMT+08:00 Billy SU [email protected]:

I met the same issue now. @ZaxR https://github.com/zaxr , @yanxiang007 https://github.com/yanxiang007 Did you find any project can do this job?? Otherwise I want to fixed this bug.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/evandempsey/fp-growth/issues/6#issuecomment-339016893, or mute the thread https://github.com/notifications/unsubscribe-auth/ASRfPt53zOWCaYXkzN7-HzAxrH3XLH8Aks5svfk4gaJpZM4Nscw8 .

Oct 26 '17 04:10 yanxiang007

Hey guys, I had fixed one of the problem in my own repo, now it can show the "7" itemset which @yanxiang007 mentioned , but it still exists some other bugs.

I am using orange3-associate as an alternate. Hope this information can help you.

Oct 26 '17 14:10 Billy4195

Hi All, In working on this issue for my own work I found a problem with the generate_association rules module specifically with the line below

if confidence >= confidence_threshold: rules[antecedent] = (consequent, confidence)

This only allows effectively one rule per item and thus the higher order rules per item where they exist are essentially over riding the simple 1 -> 1 items. I addressed this by changing the code so that instead of setting the dict value each time - if a rule already exists for an item I then append the new rule as a list to the dict value as follows

if confidence >= confidence_threshold: rule1=(consequent, confidence) rule1=list(rule1) if antecedent in rules:

                        rules[antecedent].append(rule1)
                    else:
                        rules[antecedent]=rule1

The rules of course then have to be unravelled in a slightly more complex manner but this worked well . First rule is value[0] -> value[1] . Subsequent rules are stored as list in value[n] so the unravelling is as follows where prel is the antecedent list ,postl is the consequent list and confl is the confidence list

for key,value in rules.iteritems():

    if len(value)> 2: #If item has more than 1 rule 
        for i in range(2,len(value)):# For rule 2 and subsequent rules 
            prel.append(key)
            postl.append(value[i][0])
            confl.append(value[i][1])

    prel.append(key)
    postl.append(value[0])
    confl.append(value[1])

hope this helps

Apr 30 '18 13:04 Rbain2

fp-growth fp-growth copied to clipboard

Items missing in the output

fp-growth
fp-growth copied to clipboard