mlxtend
mlxtend copied to clipboard
apriori.py line 224: ValueError: negative dimensions are not allowed
https://github.com/rasbt/mlxtend/blob/115278bac14d7fc278885c0722da03f1c3b91604/mlxtend/frequent_patterns/apriori.py#L224
Processing 24785850 combinations | Sampling itemset size 6 Traceback (most recent call last): File "***.py", line 116, in
frequent_itemsets = apriori(df, min_support=0.8, use_colnames=True, verbose=1) File "C:\ProgramData\Anaconda3\lib\site-packages\mlxtend\frequent_patterns\apriori.py", line 219, in apriori _bools = X[:, combin[:, 0]] == all_ones
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse_index.py", line 53, in getitem return self._get_sliceXarray(row, col)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\csc.py", line 222, in _get_sliceXarray return self._major_index_fancy(col)._minor_slice(row)
File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 693, in _major_index_fancy res_indices = np.empty(nnz, dtype=idx_dtype)
ValueError: negative dimensions are not allowed
In my apriori.py, variable "combin" is a (4130975, 6) dataframe comprise of indices (dtype = int32).
In compressed.py, numpy cumsum takes the dtype from indices of "combin".
Negative values appeared after the numpy cumsum reached maximum of int32.
Not sure if it is an exception for numpy cumsum or mlxtend apriori.
Below are the version of the package, Python 3.7.4
mlxtend.version '0.17.0'
numpy.version '1.16.5'
Hm, not sure about what's going on. Could also be related to the compression and a scipy bug. But isn't cumsum not always returning a int64 array to make sure these issues don't happen? E.g.,
In [9]: a = np.array(range(1000000), dtype=np.int32)
In [10]: a.dtype
Out[10]: dtype('int32')
In [11]: np.cumsum(a).dtype
Out[11]: dtype('int64')
So, I am wondering how this happens with cumsum ...
COuld you run the example above in your NumPy version, and of cumsum returns int32, you can try to see if updating to NumPy v1.17.2 helps
a.dtype Out[5]: dtype('int32')
np.cumsum(a).dtype Out[6]: dtype('int32')
Yes, it should be the problem. I will update numpy.
Thanks for confirming. In this case, we probably should add a warning to the apriori package. I am opening this issue again to address this at some point. I.e., we could simply add an
import warnings
from distutils.version import LooseVersion as Version
if Version(numpy_version) < Version("1.17"):
warnings.warn('SOME TEXT to explain the issue')
Just wanted to chime in and say that I am also experiencing this issue (row 302 instead of 224), but I confirmed that I have the latest numpy:
~/miniconda3/lib/python3.7/site-packages/mlxtend/frequent_patterns/apriori.py in apriori(df, min_support, use_colnames, max_len, verbose, low_memory)
300
301 if is_sparse:
--> 302 _bools = X[:, combin[:, 0]] == all_ones
303 for n in range(1, combin.shape[1]):
304 _bools = _bools & (X[:, combin[:, n]] == all_ones)
~/miniconda3/lib/python3.7/site-packages/scipy/sparse/_index.py in __getitem__(self, key)
51 return self._get_sliceXslice(row, col)
52 elif col.ndim == 1:
---> 53 return self._get_sliceXarray(row, col)
54 raise IndexError('index results in >2 dimensions')
55 elif row.ndim == 1:
~/miniconda3/lib/python3.7/site-packages/scipy/sparse/csc.py in _get_sliceXarray(self, row, col)
220
221 def _get_sliceXarray(self, row, col):
--> 222 return self._major_index_fancy(col)._minor_slice(row)
223
224 def _get_arrayXint(self, row, col):
~/miniconda3/lib/python3.7/site-packages/scipy/sparse/compressed.py in _major_index_fancy(self, idx)
691
692 nnz = res_indptr[-1]
--> 693 res_indices = np.empty(nnz, dtype=idx_dtype)
694 res_data = np.empty(nnz, dtype=self.dtype)
695 csr_row_index(M, indices, self.indptr, self.indices, self.data,
ValueError: negative dimensions are not allowed
I am using the sparse dtype instead of SparseDataFrame - when using SparseDataFrame with apriori it kills my Jupyter kernel.
Dataframe (bools):
DF density: 0.1837714070794341 DF shape: (60603, 1694)
Versions:
Pandas: 0.25.3 Numpy: 1.18.1 mlxtend: 0.17.1
I do not have a clear understanding of this issue, but it looks like some indices are too large, you may have to call apriori with low_memory=True
in your case. Anyway this should be fixed by #646.