fp-growth icon indicating copy to clipboard operation
fp-growth copied to clipboard

Avoid rule overwriting and include rules having an empty right side

Open L-v-M opened this issue 5 years ago • 0 comments

This pull request aims to close issues #3, #6, #11, #12 and #15.

The function generate_association_rules(...) is revised in the following ways:

  1. The return value is changed to a dictionary of the form {(left_side, right_side): (support, confidence)}. This (a) avoids rule overwriting and (b) includes the support in the rules.
  2. The generated rules now also contain rules having an empty right side as, to the best of my knowledge, is required for a correct implementation of the algorithm.

The unit tests are adapted to these changes. They now verify that this implementation produces the correct result for an example taken from the original paper describing the Apriori algorithm.

I want to add that I am very confident that this implementation is correct and it seems reasonably fast:

  1. I compared the result produced by this implementation on a large production data set with the result produced by an implementation of the Apriori algorithm I used previously. Both results were identical.
  2. This implementation was significantly faster than the implementation of the Apriori algorithm.

Thank you @evandempsey.

L-v-M avatar May 13 '19 13:05 L-v-M