benchm-ml icon indicating copy to clipboard operation
benchm-ml copied to clipboard

mxnet sparse data format

Open szilard opened this issue 8 years ago • 3 comments

Motivation: I can't run mxnet on the 10M records airline set https://github.com/szilard/benchm-ml/issues/29 because model.matrix crashes out of RAM (on g2.8xlarge with 60GB or RAM - largest available for GPU instances).

Using Matrix::sparse.model.matrix to encode the categorical data would be great (uses <2GB RAM), but I get:

Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

Strangely on the 1M dataset I get another error:

Error: io.cc:50: Seems X, y was passed in a Row major way, MXNetR adopts a column major convention.

szilard avatar Nov 28 '15 03:11 szilard

@tqchen @hetong007 Is sparse representation on the roadmap? - see thread above (I know mxnet is very new, and I have to tell you I think it already looks pretty great).

szilard avatar Dec 01 '15 05:12 szilard

Yes, this is something we should look into, can you also open an issue on https://github.com/dmlc/mxnet/issues ? Thanks

tqchen avatar Dec 02 '15 04:12 tqchen

Cool, I'll do it soon.

szilard avatar Dec 02 '15 05:12 szilard