seb.
seb.
Unit tests are currently configured to run on Travis, thanks to a webhook, which doesn't seem to be triggered anymore, or not easily accessible. Let's just turn this into a...
The goal is to provide an abstraction and default implementation(s) for most common scenarios. This would also allow frameworks to support several versions easily. Finally, and more structured framework runner...
To ensure that we don't break the app with future changes, we should automate some basic testing/verification tasks. I suggest the following: ``` fresh git clone of the repo fresh...
https://github.com/openml/automlbenchmark/issues/375 Still need to ensure that all frameworks reading arff files support this sparse format: - H2O : no, might consider consuming parquet files instead. - AutoWeka: relies on WEKA,...
Failing datasets: https://openml.org/t/360932 https://openml.org/t/360932 - serialization of sparse matrices was not applied correctly. - once fixed, the frameworks still fail with the following errors: ```text # TPOT File "/Users/seb/repos/ml/automlbenchmark/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line...
Frameworks that mainly support Arff or Parquet format (instead of pandas DF) should be allowed to consume sparse data.
Error with TPOT , Autosklearn, RandomForest (and other sklearn-based frameworks): ```text [ERROR] [amlb.benchmark:01:28:09.198] PyOpenML cannot handle string when returning numpy arrays. Use dataset_format="dataframe". Traceback (most recent call last): File "/Users/seb/.pyenv/versions/amlb/lib/python3.7/site-packages/openml/datasets/dataset.py",...
https://github.com/openml/automlbenchmark/issues/169 Execution plans will allow to run benchmarks for multiple frameworks, benchmark definitions, constraints, tasks, folds… and still benefit from full parallelism, especially in AWS mode. An execution plan is...
For the `stable` branch only, it would be useful to have a workflow that takes a framework definitions file, and from there: - list all frameworks for which to build...
Currently sparse dataset are automatically converted into dense data, generating extremely large datasets that can lead to OOM. OpenML provide some datasets in sparse ARFF format: see for exampel https://www.openml.org/t/317613...