Quentin Lhoest

Results 416 comments of Quentin Lhoest

Maybe we can just add a note in the `Value` documentation ?

The script uses `Dataset.load_from_disk`, which as you can expect, doesn't work in streaming mode. It would probably be more practical to load the dataset locally using `Dataset.load_from_disk` first and then...

Alright this is ready for review :) I mostly would like your opinion on the YAML structure and what we can do in the docs (IMO we can add the...

We plan to do a release today, we'll merge this after the release :) EDIT: actually tomorrow

Created https://github.com/huggingface/datasets/pull/5018 where I added the YAML `dataset_info` of every single dataset in this repo see other dataset cards: [imagenet-1k](https://github.com/huggingface/datasets/blob/040102f100964a33fd334e2695f1c493fa6b92db/datasets/imagenet-1k/README.md), [glue](https://github.com/huggingface/datasets/blob/040102f100964a33fd334e2695f1c493fa6b92db/datasets/glue/README.md), [flores](https://github.com/huggingface/datasets/blob/040102f100964a33fd334e2695f1c493fa6b92db/datasets/flores/README.md), [gem](https://github.com/huggingface/datasets/blob/040102f100964a33fd334e2695f1c493fa6b92db/datasets/gem/README.md)

Took your comments into account and updated `push_to_hub` to push the dataset_info to the README.md instead of json :) Let me know if it sounds good to you now !

Hi ! Sorry to hear that. This may come from another issue then. First can we check if this latency comes from the dataset itself ? You can try to...

I'm surprised by the speed change. Can you give more details about your dataset ? The speed depends on the number of batches in the arrow tables and the distribution...

Also if you could give us more info about your env like your OS, version of pyarrow and if you're using an HDD or a SSD

Hi ! Sorry for the delay I haven't had a chance to take a look at this yet. Are you still experiencing this issue ? I'm asking because the latest...