Denis Barbier
Denis Barbier
More tests should be run, maybe with kosarak.dat; when the best option is decided, I will remove/squash/rearrange commits and update docstrings. IMHO the best option is with current head.
Okay, I ran some tests with kosarak.dat and `sparse=False`. For small inputs, there is a noticeable overhead, which is caused by `np.asfortranarray`. All these tests had been run with a...
I just rebased but won't be able to take care of failures during 24h, feel free to push fixes.
There were indeed some bugs; because of these fixes, timings may be slightly different, I will rerun benchmarks in few days.
I rearranged commits, they look good now IMHO. About benchmark script, I do not know how to do that, there are many parameters: data files, sparse=True/False, column_major=True/False, and list of...
Here are more benchmarks. In these tables, `s=` means `sparse` variable, `c=` is `col_major` and `T`/`F` is `True`/`False`. T10I4D100K.dat.gz 100000 transactions and 870 items, low_memory=True min_support | 0.05 | 0.03...
Some remarks: - Processing times are much higher with sparse dataframes. I did not really investigate this issue, memory usage for dense array is now very low (except for the...
> Sorry for the sparse responses, I have been traveling over the holidays and am currently working on two manuscripts with submissions deadlines mid Jan. No worries, this issue is...
I do not have a clear understanding of this issue, but it looks like some indices are too large, you may have to call apriori with `low_memory=True` in your case....
@kno10 Are your benchmark scripts and data sets available somewhere?