Denis Barbier comments

Results 27 comments of


                                            Denis Barbier

[WIP] Implement apriori-gen as in original paper

More tests should be run, maybe with kosarak.dat; when the best option is decided, I will remove/squash/rearrange commits and update docstrings. IMHO the best option is with current head.

[WIP] Implement apriori-gen as in original paper

Okay, I ran some tests with kosarak.dat and `sparse=False`. For small inputs, there is a noticeable overhead, which is caused by `np.asfortranarray`. All these tests had been run with a...

[WIP] Implement apriori-gen as in original paper

I just rebased but won't be able to take care of failures during 24h, feel free to push fixes.

[WIP] Implement apriori-gen as in original paper

There were indeed some bugs; because of these fixes, timings may be slightly different, I will rerun benchmarks in few days.

[WIP] Implement apriori-gen as in original paper

I rearranged commits, they look good now IMHO. About benchmark script, I do not know how to do that, there are many parameters: data files, sparse=True/False, column_major=True/False, and list of...

[WIP] Implement apriori-gen as in original paper

Here are more benchmarks. In these tables, `s=` means `sparse` variable, `c=` is `col_major` and `T`/`F` is `True`/`False`. T10I4D100K.dat.gz 100000 transactions and 870 items, low_memory=True min_support | 0.05 | 0.03...

[WIP] Implement apriori-gen as in original paper

Some remarks: - Processing times are much higher with sparse dataframes. I did not really investigate this issue, memory usage for dense array is now very low (except for the...

[WIP] Implement apriori-gen as in original paper

> Sorry for the sparse responses, I have been traveling over the holidays and am currently working on two manuscripts with submissions deadlines mid Jan. No worries, this issue is...

apriori.py line 224: ValueError: negative dimensions are not allowed

I do not have a clear understanding of this issue, but it looks like some indices are too large, you may have to call apriori with `low_memory=True` in your case....

Apriori is not Apriori. It is missing all the nice optimzations

@kno10 Are your benchmark scripts and data sets available somewhere?