benchm-ml icon indicating copy to clipboard operation
benchm-ml copied to clipboard

sklearn using sparse data representation

Open szilard opened this issue 8 years ago • 2 comments

I know from @glouppe that "RFs in sklearn now support sparse matrices too" https://twitter.com/glouppe/status/660012865554903040

It would be interesting to see the results with sparse for RF and for logistic regression too. We should see lower memory footprint and perhaps faster runs. Anyone wants to help w the code (PR)?

szilard avatar Nov 06 '15 21:11 szilard

Good guess but maybe cruel reality, sparse matrices can reduce a lot of memory using, but No significant speedup... sklearn depends on scipy, if wanna try: in 2-rf/2.py, using http://docs.scipy.org/doc/scipy/reference/sparse.html instead of pandas to create the the training matrix.

ghost avatar May 05 '16 06:05 ghost

Yeah, scipy's sparse is what I was thinking/hoping someone can take a look. You could try this simplified setup https://github.com/szilard/benchm-ml/tree/master/z-other-tools with the initial python code here https://github.com/szilard/benchm-ml/blob/master/z-other-tools/2.py You could time this and also sparse and submit results here/PR.

szilard avatar May 05 '16 14:05 szilard