automlbenchmark
automlbenchmark copied to clipboard
Write splits for sparse data into sparse arff format
https://github.com/openml/automlbenchmark/issues/375
Still need to ensure that all frameworks reading arff files support this sparse format:
- H2O : no, might consider consuming parquet files instead.
- AutoWeka: relies on WEKA, so it should support sparse data.
- MLPlan: relies on WEKA (?)
- autoxgboost: relies on
farff:readARFF
(see below). - mlr3automl: relies on
mlr3oml::read_arff
, https://rdrr.io/github/mlr-org/mlr3oml/man/read_arff.html - ranger (actually just a reference R impl of RF): relies on
farff::readARFF
that doesn't support sparse Arff files, https://rdrr.io/cran/farff/man/readARFF.html
I think that's reasonable, helps us get towards full sparse format support even if there's currently a mismatch for some frameworks.