Difference beetween fpgrowth and fpmax not documented
Describe the documentation issue
Hi. I´m using the library to find association rules in a dataset. In order to do that, I´m passing the output of the three algorithms to the association_rules() function. The documentation says these are equivalent in terms of parameters and output, but I´m getting on the following error only with the output from fpmax() :
KeyError: 'frozenset({120})You are likely getting this error because the DataFrame is missing antecedent and/or consequent information. You can try using the `support_only=True` option'
A minimal code example of my implementation would be like
from mlxtend.frequent_patterns import fpgrowth
from mlxtend.frequent_patterns import fpmax
### Assume baskets_matrix is an ad_hoc pandas df.
### This works OK
freq_items_1 = fpgrowth(baskets_matrix, min_support=0.1)
freq_items_2 = fpmax(baskets_matrix, min_support=0.1)
### This also works OK
AR_1 =association_rules(freq_items_1, metric="confidence", min_threshold=0.5)
### This raises the error
AR_2 =association_rules(freq_items_2, metric="confidence", min_threshold=0.5)
Since all other factors are the same, I have to assume that there is a difference in the output of fpgrowth and fpmax which is not clearly documented.
I also noticed that the documentation refers to the association_rules() function as generate_rules() which leads to further confussion.
Suggest a potential improvement or addition
I would like to ask if it´s possible to clarify if the output from the different algoriths are indeed different or there is another issue here.
Also, I think it will be useful for anyone using the library to have this remarks added on the documentatinon.
Thanks in advance!
As per the documentation "FP-Max is a variant of FP-Growth, which focuses on obtaining maximal itemsets. An itemset X is said to maximal if X is frequent and there exists no frequent super-pattern containing X. In other words, a frequent pattern X cannot be sub-pattern of larger frequent pattern to qualify for the definition maximal itemset." That being said, I am getting the error too when using FP-Max.
Same here, when mining frequent itemsets with fp-growth it works fine, but when using fp-max I get the same error. a example of my code is:
Assume negated is a one-hot encoded dataframe
max = fpmax(negated, min_support=0.3, use_colnames=True, max_len=5) max rules = association_rules(max,metric="confidence", min_threshold=0.85) # Error appears here
Works well
max = fpgrowth(negated, min_support=0.3, use_colnames=True, max_len=5) max rules = association_rules(max,metric="confidence", min_threshold=0.85)