dask-glm icon indicating copy to clipboard operation
dask-glm copied to clipboard

[WIP] Allow pure numpy array (not dask array) as inputs

Open daxiongshu opened this issue 4 years ago • 2 comments

Currently dask_glm.estimators only accepts dask.array as inputs due to the line below and other places where ._meta is accessed without checking the data type.

https://github.com/dask/dask-glm/blob/7b2f85fe043eb29212755e67e33e3df553ed0e58/dask_glm/estimators.py#L67 https://github.com/dask/dask-glm/blob/7b2f85fe043eb29212755e67e33e3df553ed0e58/dask_glm/utils.py#L120-L124

Click to see the example code and error

Code:

from dask_glm.estimators import LogisticRegression
import numpy
x = numpy.random.rand(10,4)
y = numpy.random.rand(10)

lr = LogisticRegression()
lr.fit(x,y)

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-e644bf405118> in <module>
----> 1 lr.fit(x,y)

~/rapids/daskml_cupy/dask-glm/dask_glm/estimators.py in fit(self, X, y)
     65         X_ = self._maybe_add_intercept(X)
     66         fit_kwargs = dict(self._fit_kwargs)
---> 67         if is_dask_array_sparse(X):
     68             fit_kwargs['normalize'] = False
     69 

~/rapids/daskml_cupy/dask-glm/dask_glm/utils.py in is_dask_array_sparse(X)
    122     Check using _meta if a dask array contains sparse arrays
    123     """
--> 124     return isinstance(X._meta, sparse.SparseArray)
    125 
    126 

AttributeError: 'numpy.ndarray' object has no attribute '_meta'

This PR allows numpy arrays (not dask numpy array) as input directly.

daxiongshu avatar Oct 29 '20 06:10 daxiongshu

@mrocklin @pentschev I just added one test for now. If it is ok, could you please suggest which other tests I should add numpy input? Thank you!

daxiongshu avatar Oct 29 '20 12:10 daxiongshu

~I think I'm going to finish this first and then move on to #89~ Not really. I'll move on to #89

daxiongshu avatar Oct 29 '20 12:10 daxiongshu