automlbenchmark Write splits for sparse data into sparse arff format

Write splits for sparse data into sparse arff format

Open sebhrusen opened this issue 3 years ago • 1 comments

https://github.com/openml/automlbenchmark/issues/375

Still need to ensure that all frameworks reading arff files support this sparse format:

H2O : no, might consider consuming parquet files instead.
AutoWeka: relies on WEKA, so it should support sparse data.
MLPlan: relies on WEKA (?)
autoxgboost: relies on farff:readARFF (see below).
mlr3automl: relies on mlr3oml::read_arff, https://rdrr.io/github/mlr-org/mlr3oml/man/read_arff.html
ranger (actually just a reference R impl of RF): relies on farff::readARFF that doesn't support sparse Arff files, https://rdrr.io/cran/farff/man/readARFF.html

Aug 02 '21 21:08 sebhrusen

I think that's reasonable, helps us get towards full sparse format support even if there's currently a mismatch for some frameworks.

Aug 16 '21 09:08 PGijsbers