mlxtend
mlxtend copied to clipboard
Fpgrowth fails with only one transaction
I have a big dataset with real data. After several attempts, the execution fails at one transaction. I isolated the transaction and re-executed the algorithm. Always fails. I can't understand why it fails at this point, even with the isolated transaction.
Example:
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns.fpgrowth import fpgrowth
import pandas as pd
transactions =[ [
114367, 116953, 123213, 125589, 128047, 128579, 130407, 132025, 132082,
134190, 136097, 136098, 136181, 136357, 136656, 136658, 136659, 136992,
137180, 137181, 137395, 138215, 139339, 139520, 139551, 140008, 140012,
140021
]]
def get_fpgrowth_associated_products(product_name):
# filter out transactions that don't include the target product
filtered_transactions = [t for t in transactions if product_name in t]
te = TransactionEncoder()
te_ary = te.fit(filtered_transactions).transform(filtered_transactions)
# Convert the one-hot encoded array into a pandas DataFrame
df = pd.DataFrame(te_ary, columns=te.columns_)
# Compute frequent itemsets using the FP-growth algorithm (min_support = 0.5)
freq_itemsets = fpgrowth(df, min_support=0.5, use_colnames=True)
itemsets=set(freq_itemsets.itemsets)
# find the sets that include the target product
target_sets = [s for s in itemsets if product_name in s]
# combine the other items from those sets into a single set
associated_items = set()
for s in target_sets:
associated_items |= s - {product_name}
return list(associated_items)
get_fpgrowth_associated_products(136181)
Versions
MLxtend 0.22.0 Linux-5.19.0-43-generic-x86_64-with-glibc2.35 Python 3.8.16 Scikit-learn 1.2.2 NumPy 1.24.3 SciPy 1.9.3