distilabel
distilabel copied to clipboard
[FEATURE] Update `PushToHub` to stream data to the Hub
Is your feature request related to a problem? Please describe. Some users would like to inspect the dataset as it's being created in the Hugging Face Hub.
Describe the solution you'd like
Initially we had DatasetCheckpoint
to push the dataset to the hub while the pipeline was being executed, we could have similar functionality.
Describe alternatives you've considered Wait until the end of the pipeline.
Additional context
We have the PushToHub
step but it's global, maybe we can just adapt it to push with a given batch size