Add MedImg for streaming
Feature request
Host the MedImg dataset (similar to Imagenet but for biomedical images).
Motivation
There is a clear need for biomedical image foundation models and large scale biomedical datasets that are easily streamable. This would be an excellent tool for the biomedical community.
Your contribution
MedImg can be found here.
@mariosasko, @lhoestq, @albertvillanova Hello! Can anyone help? or can you guys suggest who can help with this?
Hi ! Feel free to download the dataset and create a Dataset object with it.
Then your'll be able to use push_to_hub() to upload the dataset to HF in Parquet format and make it streamable :)
Hi ! Feel free to download the dataset and create a
Datasetobject with it.Then your'll be able to use
push_to_hub()to upload the dataset to HF in Parquet format and make it streamable :)
The dataset is several TB in total, which I do not have the resources to handle.
Hi @lhoestq and @albertvillanova , just following up about this.
for big datasets you can push_to_hub one part at a time (e.g. as different splits) and merge the parts (just a simple modification in the YAML part of the README)
Sure, that makes sense. However, isn't there a size limit to what typical users can push?
Yes there is a limit, simply let us know by email at datasets [at] huggingface.co - this way we can give you a storage grant also help making sure the dataset is all good for people to use it easily
Yes there is a limit, simply let us know by email at datasets [at] huggingface.co - this way we can give you a storage grant also help making sure the dataset is all good for people to use it easily
Got it, that would be great.