OpenML icon indicating copy to clipboard operation
OpenML copied to clipboard

Request timeouts: https://www.openml.org/api/v1/json/data/features/23383

Open mitar opened this issue 6 years ago • 11 comments

Trying to access https://www.openml.org/api/v1/json/data/features/23383 and it just timeouts, never to return anything. Accessing other URIs works well.

mitar avatar Dec 12 '19 19:12 mitar

One more example: https://www.openml.org/api/v1/json/data/features/41147

mitar avatar Dec 16 '19 02:12 mitar

Another example: https://www.openml.org/api/v1/json/data/features/42435

mitar avatar May 29 '20 17:05 mitar

Another one: https://www.openml.org/api/v1/json/data/features/42706

mitar avatar Oct 12 '20 04:10 mitar

And: https://www.openml.org/api/v1/json/data/features/42708

mitar avatar Oct 13 '20 17:10 mitar

And: https://www.openml.org/api/v1/json/data/features/43034

mitar avatar Oct 20 '21 21:10 mitar

Update: this seems to be caused by bad feature types. Some are large datasets that have row id's and other numeric values (e.g. lat-long values, dates, ...) encoded as categories (with a lot of values). The server returns the full list of categories in the feature list, hence this takes an insane amount of time and resources.

Best thing to do is probably to manually fix the encoding in the ARFF file and re-process the datasets. If there are other suggestions, please let me know.

joaquinvanschoren avatar Oct 22 '21 10:10 joaquinvanschoren

There are new ones like: https://www.openml.org/api/v1/json/data/features/44538

mitar avatar Jul 27 '23 20:07 mitar

Thanks, we'll look into these.

joaquinvanschoren avatar May 16 '24 15:05 joaquinvanschoren

@joaquinvanschoren Actually, after a lot more testing I figured out I was wrong. The code I was using had a relatively small timeout (1 minute) and these took close to two minutes to load. Sorry for the confusion and thank you for the response. I really like OpenML. I appreciate everything you're doing.

mrucker avatar May 16 '24 16:05 mrucker

Great to hear! There are still a few that fail, mainly datasets with huge numbers of features. We might opt to resolve this in the new REST API, that we hope to deploy late this summer.

joaquinvanschoren avatar May 16 '24 16:05 joaquinvanschoren