openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

Make downloading dataset features optional

Open PGijsbers opened this issue 4 years ago • 1 comments

Currently the data features file is always downloaded here. I propose we make it optional, similar to qualities and the dataset itself. If not downloaded, it should follow the same lazy loading behavior that qualities does.

Having features immediately available is often not necessary. E.g. automated runs don't use them and when users download the data much of features info is already available in the built dataframe. But it does bring the usual downsides of longer user wait times, especially when downloading features with high cardinality categorical features, e.g. albert, and the additional server strain.

PGijsbers avatar May 14 '21 09:05 PGijsbers

Should also update the dataset repr method so that it can obtain #features from qualities if that is available and features is not.

PGijsbers avatar May 14 '21 10:05 PGijsbers

This issue is resolved. We introduced lazy loading in #1260 and made downloading the features and qualities optional from 0.15.0 onwards.

LennartPurucker avatar Jun 16 '23 07:06 LennartPurucker