tmu icon indicating copy to clipboard operation
tmu copied to clipboard

Support for scipy.sparse matrices

Open Defasium opened this issue 2 years ago • 1 comments

Hi, @olegranmo! It would be very cool if methods fit, predict and transform could accept scipy.sparse matrices like most of the sklearn api models: LogisticRegression, MultinomialNB, RandomForest, etc... For example in https://github.com/cair/tmu/blob/main/examples/MNISTDemo.py#L18 Converting binarized X_train into scipy.sparse.csr_matrix can lower RAM consumption by a large factor. This would be very convinient in case of relatively large datasets (with over 1 million examples). Or when there are a lot of features (like high-res images).

Defasium avatar Jan 13 '23 21:01 Defasium

Great point, @Defasium! Will add support for sparse matrixes at the first opportunity. Currently, the class TMAutoEncoder uses sparse input matrixes to deal with large text datasets.

olegranmo avatar Jan 13 '23 23:01 olegranmo