mlxtend Apriori - I'm only interested in one product / consequent. Can I speed up the apriori algorithm?

Apriori - I'm only interested in one product / consequent. Can I speed up the apriori algorithm?

Open soliverc opened this issue 4 years ago • 6 comments

For example, if I'm only interested in items frequently bought with eggs, is there a way to avoid generating itemsets for every other possible combination? This takes a lot of time, and with the generated rule set I end up filtering the huge dataframe to find only rows where consequents == eggs, discarding everything else. Can I speed this process up if I'm only interested in one consequent?

Jan 22 '20 14:01 soliverc

Sorry, there's currently no option for that. A "must_contain=('some item', 'some other item', ...)" feature could be a useful enhancement though.

Jan 22 '20 15:01 rasbt

There are two things here. If you are only interested in frequent items containing some item, a "must_contain" type feature would be useful, and fairly straightforward to implement.

If you are interested in filtering on the consequent in rules, then that is more challenging. For example, say you are only interested in rules with consequent==eggs. You cannot mine only frequent patterns containing "eggs" because the rules rely on supports of other items. And, to get those, you need patterns which may not contain "eggs".

May 20 '20 03:05 harenbergsd

I agree with you @harenbergsd . In addition, while adding a check for whether frequent itemsets contain a particular item may look useful in practice, I am not sure if that would necessarily speed things up (compared to pruning the resulting data frame). I think the major thing that it would accomplish is reducing the memory footprint of the resulting dataframe, because it really is just a check of whether an already computed itemset is to be added to the dataframe or not.

May 20 '20 21:05 rasbt

I found this blog post which I believe talks about this sort of thing: https://wimleers.com/article/fp-growth-powered-association-rule-mining-with-support-for-constraints

Like you, I do wonder how much perf improvement you could get by doing this sort of thing. It's certainly an interesting idea. You have to imagine, if you are only interested in a few consequents, you end up doing a lot of extra work and tossing stuff a way. But, it's tricky to know what is the extra work. Nice research project, depending on existing literature in this area :)

May 23 '20 15:05 harenbergsd

The performance of MLXtend "apriori" is pretty poor. It does not implement the optimizations of the original Apriori algorithm such as the prefix join to avoid generating unnecessary candidates or the hash tree; but it is a quite naive version instead that generates way too many candidates. See #644 - the runtime of MLXtend was 2 minutes, of pyfim just 150 ms; so MLXtend was about 750x slower. Hence if performance is a problem for you, consider using the well-known much faster "fim" module by Christian Borgelt: http://www.borgelt.net/pyfim.html - or improve the Apriori of MLXtend and first add the missing nice optimizations that are part of the Apriori algorithm.

May 26 '20 11:05 kno10

Yes, https://github.com/rasbt/mlxtend/pull/646 should address that.

May 26 '20 15:05 rasbt

mlxtend mlxtend copied to clipboard

Apriori - I'm only interested in one product / consequent. Can I speed up the apriori algorithm?

mlxtend
mlxtend copied to clipboard