Quentin Lhoest

Results 416 comments of Quentin Lhoest

We improved speed in `datasets` 2.19 btw, see comments at https://github.com/huggingface/datasets/issues/6800 :)

that only concerns the `~/.cache/huggingface/datasets` cache used only for unzipping content / generating arrow files / etc. and not eligible for scan-cache

They're not "deprecated" since we'll keep supporting them in `datasets`, they're rather "unsupported for the Viewer". Maybe we should mention it here ? https://huggingface.co/docs/hub/datasets-viewer

Hi ! Having the exact same issue here on any DuckDB index on Hugging Face, e.g. using nvidia/HelpSteer [index.duckdb](https://huggingface.co/datasets/nvidia/HelpSteer/resolve/refs%2Fconvert%2Fduckdb/default/validation/index.duckdb) Do you know a workaround we could use in the meantime...

Still getting this issue on 0.10.2 btw, e.g. in python: ```python >>> con.sql("ATTACH 'https://huggingface.co/datasets/fka/awesome-chatgpt-prompts/resolve/refs%2Fconvert%2Fduckdb/default/train/index.duckdb' as fka;") --------------------------------------------------------------------------- CatalogException Traceback (most recent call last) [](https://localhost:8080/#) in () ----> 1 con.sql("ATTACH 'https://huggingface.co/datasets/fka/awesome-chatgpt-prompts/resolve/refs%2Fconvert%2Fduckdb/default/train/index.duckdb'...

> Tag all vision datasets on the hub with "vision" such that people can easily retrieve them Currently we don't tag datasets with a "image" or "text" or "audio" tag...

Ok sounds good :) I'll also adapt the [datasets tagging app](https://huggingface.co/spaces/huggingface/datasets-tagging) to support this field then, and make it compatible for vision datasets (right now it's heavily focused on text...

> (if we move to the new task scheme that you proposed recently, we don't need to hardcode the modality b/c it will be implied by the task) Yes indeed,...

Maybe we can improve execution time once more (hf_transfer + more CPU), do some comms et check the usage evolution ? I'll also ask about it to viewer users at...