automlbenchmark icon indicating copy to clipboard operation
automlbenchmark copied to clipboard

Write splits for sparse data into sparse arff format

Open sebhrusen opened this issue 3 years ago • 1 comments

https://github.com/openml/automlbenchmark/issues/375

Still need to ensure that all frameworks reading arff files support this sparse format:

  • H2O : no, might consider consuming parquet files instead.
  • AutoWeka: relies on WEKA, so it should support sparse data.
  • MLPlan: relies on WEKA (?)
  • autoxgboost: relies on farff:readARFF (see below).
  • mlr3automl: relies on mlr3oml::read_arff, https://rdrr.io/github/mlr-org/mlr3oml/man/read_arff.html
  • ranger (actually just a reference R impl of RF): relies on farff::readARFF that doesn't support sparse Arff files, https://rdrr.io/cran/farff/man/readARFF.html

sebhrusen avatar Aug 02 '21 21:08 sebhrusen

I think that's reasonable, helps us get towards full sparse format support even if there's currently a mismatch for some frameworks.

PGijsbers avatar Aug 16 '21 09:08 PGijsbers