dataset-viewer
dataset-viewer copied to clipboard
Return partial dataset-hub-cache instead of error?
dataset-hub-cache
depends on multiple previous steps, and any error in one of them makes it fail. It provokes things like https://github.com/huggingface/moon-landing/issues/9799 (internal): in the datasets list, a dataset is not marked as "supporting the dataset viewer", whereas the only issue is that we didn't manage to list the compatible libraries, to create the tags.
https://github.com/huggingface/dataset-viewer/blob/main/services/worker/src/worker/job_runners/dataset/hub_cache.py
In this case, we could return a partial response, or maybe return an empty list of libraries or modalities if we have an error.
What do you think @lhoestq?
In fact, it would make a lot more sense to always return the status of the dataset (ie. the list of the capabilities), instead of propagating an error.
Maybe we can just add a try/except in compatible-libraries and return something empty + the reason why it's empty
yes, an empty list of tags/libraries/modalities/formats is OK (no need to pass a reason in my opinion, since it's a custom endpoint for the Hub).
And we should always pass the other fields: preview, viewer, partial, num_rows (num_rows == 0 is OK if the other fields are false, I think)
https://github.com/huggingface/dataset-viewer/blob/9d4d0d2e69301e5eee3b68a889dd22c7cbe96642/services/worker/src/worker/dtos.py#L254