benchm-ml
benchm-ml copied to clipboard
mxnet sparse data format
Motivation: I can't run mxnet on the 10M records airline set https://github.com/szilard/benchm-ml/issues/29 because model.matrix
crashes out of RAM (on g2.8xlarge with 60GB or RAM - largest available for GPU instances).
Using Matrix::sparse.model.matrix
to encode the categorical data would be great (uses <2GB RAM), but I get:
Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Strangely on the 1M dataset I get another error:
Error: io.cc:50: Seems X, y was passed in a Row major way, MXNetR adopts a column major convention.
@tqchen @hetong007 Is sparse representation on the roadmap? - see thread above (I know mxnet is very new, and I have to tell you I think it already looks pretty great).
Yes, this is something we should look into, can you also open an issue on https://github.com/dmlc/mxnet/issues ? Thanks
Cool, I'll do it soon.