mlxtend icon indicating copy to clipboard operation
mlxtend copied to clipboard

apriori.py line 224: ValueError: negative dimensions are not allowed

Open fixablecar opened this issue 5 years ago • 6 comments

https://github.com/rasbt/mlxtend/blob/115278bac14d7fc278885c0722da03f1c3b91604/mlxtend/frequent_patterns/apriori.py#L224

Processing 24785850 combinations | Sampling itemset size 6 Traceback (most recent call last): File "***.py", line 116, in frequent_itemsets = apriori(df, min_support=0.8, use_colnames=True, verbose=1)

File "C:\ProgramData\Anaconda3\lib\site-packages\mlxtend\frequent_patterns\apriori.py", line 219, in apriori _bools = X[:, combin[:, 0]] == all_ones

File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse_index.py", line 53, in getitem return self._get_sliceXarray(row, col)

File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\csc.py", line 222, in _get_sliceXarray return self._major_index_fancy(col)._minor_slice(row)

File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 693, in _major_index_fancy res_indices = np.empty(nnz, dtype=idx_dtype)

ValueError: negative dimensions are not allowed

In my apriori.py, variable "combin" is a (4130975, 6) dataframe comprise of indices (dtype = int32).

In compressed.py, numpy cumsum takes the dtype from indices of "combin".

Negative values appeared after the numpy cumsum reached maximum of int32.

Not sure if it is an exception for numpy cumsum or mlxtend apriori.

fixablecar avatar Oct 30 '19 07:10 fixablecar

Below are the version of the package, Python 3.7.4

mlxtend.version '0.17.0'

numpy.version '1.16.5'

fixablecar avatar Oct 30 '19 07:10 fixablecar

Hm, not sure about what's going on. Could also be related to the compression and a scipy bug. But isn't cumsum not always returning a int64 array to make sure these issues don't happen? E.g.,

In [9]: a = np.array(range(1000000), dtype=np.int32)                            

In [10]: a.dtype                                                                
Out[10]: dtype('int32')

In [11]: np.cumsum(a).dtype                                                     
Out[11]: dtype('int64')

So, I am wondering how this happens with cumsum ...

COuld you run the example above in your NumPy version, and of cumsum returns int32, you can try to see if updating to NumPy v1.17.2 helps

rasbt avatar Oct 30 '19 14:10 rasbt

a.dtype Out[5]: dtype('int32')

np.cumsum(a).dtype Out[6]: dtype('int32')

Yes, it should be the problem. I will update numpy.

fixablecar avatar Oct 31 '19 05:10 fixablecar

Thanks for confirming. In this case, we probably should add a warning to the apriori package. I am opening this issue again to address this at some point. I.e., we could simply add an

import warnings
from distutils.version import LooseVersion as Version


if Version(numpy_version) < Version("1.17"):
    warnings.warn('SOME TEXT to explain the issue')
    

rasbt avatar Oct 31 '19 14:10 rasbt

Just wanted to chime in and say that I am also experiencing this issue (row 302 instead of 224), but I confirmed that I have the latest numpy:

 ~/miniconda3/lib/python3.7/site-packages/mlxtend/frequent_patterns/apriori.py in apriori(df, min_support, use_colnames, max_len, verbose, low_memory)
     300 
     301             if is_sparse:
 --> 302                 _bools = X[:, combin[:, 0]] == all_ones
     303                 for n in range(1, combin.shape[1]):
     304                     _bools = _bools & (X[:, combin[:, n]] == all_ones)
 
 ~/miniconda3/lib/python3.7/site-packages/scipy/sparse/_index.py in __getitem__(self, key)
      51                 return self._get_sliceXslice(row, col)
      52             elif col.ndim == 1:
 ---> 53                 return self._get_sliceXarray(row, col)
      54             raise IndexError('index results in >2 dimensions')
      55         elif row.ndim == 1:
 
 ~/miniconda3/lib/python3.7/site-packages/scipy/sparse/csc.py in _get_sliceXarray(self, row, col)
     220 
     221     def _get_sliceXarray(self, row, col):
 --> 222         return self._major_index_fancy(col)._minor_slice(row)
     223 
     224     def _get_arrayXint(self, row, col):
 
 ~/miniconda3/lib/python3.7/site-packages/scipy/sparse/compressed.py in _major_index_fancy(self, idx)
     691 
     692         nnz = res_indptr[-1]
 --> 693         res_indices = np.empty(nnz, dtype=idx_dtype)
     694         res_data = np.empty(nnz, dtype=self.dtype)
     695         csr_row_index(M, indices, self.indptr, self.indices, self.data,
 
 ValueError: negative dimensions are not allowed

I am using the sparse dtype instead of SparseDataFrame - when using SparseDataFrame with apriori it kills my Jupyter kernel.

Dataframe (bools):

DF density: 0.1837714070794341 DF shape: (60603, 1694)

Versions:

Pandas: 0.25.3 Numpy: 1.18.1 mlxtend: 0.17.1

jakeplace avatar Feb 05 '20 12:02 jakeplace

I do not have a clear understanding of this issue, but it looks like some indices are too large, you may have to call apriori with low_memory=True in your case. Anyway this should be fixed by #646.

dbarbier avatar Feb 05 '20 13:02 dbarbier