Pieter Gijsbers comments

Results 466 comments of


                                            Pieter Gijsbers

Proposal to include `numberOfInstances` and `numberOfFeatures` qualities in the dataset description

In the case of the automl benchmark, we actually approach the dataset through its task (we know the task id). So using the `list_datasets` or getting the `qualities` directly both...

Speed up Run indexing

I suspected this was the case, and wrote a script to verify (and then I saw the open issue, oops). I'll just leave my code here that shows the effect,...

Column names with '\%' are renamed

I edited your post to use code-snippets, as otherwise the backslashes are not visible which makes the report very confusing :) I think using the features in their un-escaped form...

Column names with '\%' are renamed

My bad, I misread your explanation. It looks like Pandas has the escaped feature names because they are already escaped in the ARFF header. This means it's not an `openml-python`...

What date format is expected/preferred for a dataset's collection date?

In that case I would propose to use the YYYY-MM-DD format as per [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601).

OpenML API parquet migration - Phase 2

I created https://github.com/openml/openml-python/issues/1141. Can you elaborate on the new sequence of communication for uploading the dataset from a client API? Are the new endpoints already available? > Assign the uploaded...

Simplify data splits for classification/regression tasks

As icing on the cake we probably should also make these parquet files, of course :) And perhaps consider saving the files after they are generated, so they don't need...

Added unit test to verify how the dataset object handles comparisons

I'll wait for Appveyor to complete. Github Actions seems to fail due to too high workload on the server. I will throttle the number of parallel jobs further in the...

Code coverage fails

If I recall correctly this may have had to do with the unit test clean up also removing the coverage file before upload.

Code coverage fails

I remember working on this and seeing the coverage file get deleted on my system by test clean up (because they clean all new files from `openml` instead of `openml/tests/files/`)....