seb. issues

Results 37 issues of


                                            seb.

Create GH workflow for unit tests

Unit tests are currently configured to run on Travis, thanks to a webhook, which doesn't seem to be triggered anymore, or not easily accessible. Let's just turn this into a...

automation

Define abstractions for framework integration

The goal is to provide an abstraction and default implementation(s) for most common scenarios. This would also allow frameworks to support several versions easily. Finally, and more structured framework runner...

enhancement

framework

Automate testing / version upgrade / docker images uploads

To ensure that we don't break the app with future changes, we should automate some basic testing/verification tasks. I suggest the following: ``` fresh git clone of the repo fresh...

quality

automation

Write splits for sparse data into sparse arff format

https://github.com/openml/automlbenchmark/issues/375 Still need to ensure that all frameworks reading arff files support this sparse format: - H2O : no, might consider consuming parquet files instead. - AutoWeka: relies on WEKA,...

Scipy sparse matrices not handled correctly by TPOT and autosklearn

Failing datasets: https://openml.org/t/360932 https://openml.org/t/360932 - serialization of sparse matrices was not applied correctly. - once fixed, the frameworks still fail with the following errors: ```text # TPOT File "/Users/seb/repos/ml/automlbenchmark/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line...

Add support for sparse data in Arff and parquet formats.

Frameworks that mainly support Arff or Parquet format (instead of pandas DF) should be allowed to consume sparse data.

issue with OpenML tasks openml.org/t/359947 & openml.org/t/360115

Error with TPOT , Autosklearn, RandomForest (and other sklearn-based frameworks): ```text [ERROR] [amlb.benchmark:01:28:09.198] PyOpenML cannot handle string when returning numpy arrays. Use dataset_format="dataframe". Traceback (most recent call last): File "/Users/seb/.pyenv/versions/amlb/lib/python3.7/site-packages/openml/datasets/dataset.py",...

data

Support for execution plans

https://github.com/openml/automlbenchmark/issues/169 Execution plans will allow to run benchmarks for multiple frameworks, benchmark definitions, constraints, tasks, folds… and still benefit from full parallelism, especially in AWS mode. An execution plan is...

Workflow to build and publish docker images

For the `stable` branch only, it would be useful to have a workflow that takes a framework definitions file, and from there: - list all frameworks for which to build...

Add support for sparse data

Currently sparse dataset are automatically converted into dense data, generating extremely large datasets that can lead to OOM. OpenML provide some datasets in sparse ARFF format: see for exampel https://www.openml.org/t/317613...

enhancement