distilabel icon indicating copy to clipboard operation
distilabel copied to clipboard

[FEATURE] Update `PushToHub` to stream data to the Hub

Open plaguss opened this issue 6 months ago • 4 comments

Is your feature request related to a problem? Please describe. Some users would like to inspect the dataset as it's being created in the Hugging Face Hub.

Describe the solution you'd like Initially we had DatasetCheckpoint to push the dataset to the hub while the pipeline was being executed, we could have similar functionality.

Describe alternatives you've considered Wait until the end of the pipeline.

Additional context We have the PushToHub step but it's global, maybe we can just adapt it to push with a given batch size

plaguss avatar Aug 07 '24 07:08 plaguss