Release iNatAg datasets on Hugging Face
Hi @amogh7joshi 🤗
Niels here from the open-source team at Hugging Face. I discovered your work through Hugging Face's daily papers as yours got featured: https://huggingface.co/papers/2503.20068. The paper page lets people discuss about your paper and lets them find artifacts about it (your datasets for instance), you can also claim the paper as yours which will show up on your public profile at HF, add Github and project page URLs.
It'd be great to make the iNatAg and iNatAg-mini datasets available on the 🤗 hub, to improve their discoverability/visibility. We can add tags so that people find them when filtering https://huggingface.co/datasets.
Uploading dataset
Would be awesome to make the dataset available on 🤗 , so that people can do:
from datasets import load_dataset
dataset = load_dataset("your-hf-org-or-username/your-dataset")
See here for a guide: https://huggingface.co/docs/datasets/loading.
Besides that, there's the dataset viewer which allows people to quickly explore the first few rows of the data in the browser.
Let me know if you're interested/need any help regarding this!
Cheers,
Niels ML Engineer @ HF 🤗
@NielsRogge We're trying to host it on the newly created https://huggingface.co/Project-AgML. However, we're running into issues with doing this as an organization without the Enterprise feature. We're an open-source community of developers and won't be able to pay for Enterprise. Is there a way around this?
@NielsRogge It looks like we've found a workaround. So disregard the previous message.
Ok, let me know if you need any assistance.
Regarding storage limits for datasets, up to 300GB is free: https://huggingface.co/docs/hub/en/storage-limits#sharing-large-datasets-on-the-hub. You can also get community grants, if you apply for them.
Btw very cool project! We have for example a tutorial on how to train a ViT image classifier on the beans dataset to classify healthy vs. diseased leaves: https://huggingface.co/blog/fine-tune-vit.
It would be really cool to make all the image datasets available on the hub 🤗 this guide might be helpful: https://huggingface.co/docs/datasets/en/image_dataset
@NielsRogge We have created a discussion post on this link - https://huggingface.co/spaces/Project-AgML/README/discussions/1#67e87c65ea97f3c65c03b7db
Requesting for storage grants.
@NielsRogge iNatAg dataset is up on HuggingFace - https://huggingface.co/datasets/Project-AgML/iNatAg
iNatAg-mini is currently getting uploaded.
@NielsRogge iNatAg-mini dataset is also up on HuggingFace - https://huggingface.co/datasets/Project-AgML/iNatAg-mini
Great! Which format are you using to store the data? Cause clicking on the "files and versions" tab seems to give a 504.
We might look into using webdataset: https://huggingface.co/docs/hub/en/datasets-webdataset
Hi everyone! I'm new to open-source contributions and really interested in getting involved with AgML. I saw this issue is still open and was wondering if there’s anything that still needs work, or if there are any beginner-friendly tasks related to dataset integration? I'd love to help and learn as I go, so if anyone has guidance on where I could start, I'd really appreciate it!
Great! One thing that could be done is take any image classification dataset from the Github README and make it available on the hub. For example this one: https://github.com/Project-AgML/AgML/blob/main/docs/datasets/bean_disease_uganda.md, could easily be turned into a HF dataset by following this guide: https://huggingface.co/docs/datasets/en/image_dataset. Next you can do dataset.push_to_hub("your-hf-username/bean-disease-uganda") to make it available for everyone.
Hi @Darcieg, thanks for your interest!
We're always looking to expand the datasets available in AgML, and if you're interested, you could help by contributing more datasets. Anything that fits within the broad range of agriculture-related image datasets would be a great addition! You can follow the contribution guide and let us know if/when you've found a dataset that could be good to add by making a PR!
Awesome--thank you!
On Mon, May 26, 2025 at 1:02 AM NielsRogge @.***> wrote:
NielsRogge left a comment (Project-AgML/AgML#73) https://github.com/Project-AgML/AgML/issues/73#issuecomment-2908906478
Great! One thing that could be done is take any image classification dataset from the Github README and make it available on the hub. For example this one: https://github.com/Project-AgML/AgML/blob/main/docs/datasets/bean_disease_uganda.md, could easily be turned into a HF dataset by following this guide: https://huggingface.co/docs/datasets/en/image_dataset. Next you can do dataset.push_to_hub("your-hf-username/bean-disease-uganda") to make it available for everyone.
— Reply to this email directly, view it on GitHub https://github.com/Project-AgML/AgML/issues/73#issuecomment-2908906478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDDLSG62TZ7QCRC7PWETQD3ALDAHAVCNFSM6AAAAABZ4NZ2GOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMBYHEYDMNBXHA . You are receiving this because you commented.Message ID: @.***>
Hi, folks--
Sorry for the delay--the dataset is now available at darcieg/bean-disease-uganda · Datasets at Hugging Face https://huggingface.co/datasets/darcieg/bean-disease-uganda.
I also created a PR to replace the existing link, which led to a page with no dataset, with my link. The PR is here: Replace placeholder dataset link with live Hugging Face version by Darcieg · Pull Request #6 · AI-Lab-Makerere/ibean https://github.com/AI-Lab-Makerere/ibean/pull/6
Let me know if everything looks good!
Thanks. -Darcie
On Tue, May 27, 2025 at 11:47 AM Darcie Gurley @.***> wrote:
Awesome--thank you!
On Mon, May 26, 2025 at 1:02 AM NielsRogge @.***> wrote:
NielsRogge left a comment (Project-AgML/AgML#73) https://github.com/Project-AgML/AgML/issues/73#issuecomment-2908906478
Great! One thing that could be done is take any image classification dataset from the Github README and make it available on the hub. For example this one: https://github.com/Project-AgML/AgML/blob/main/docs/datasets/bean_disease_uganda.md, could easily be turned into a HF dataset by following this guide: https://huggingface.co/docs/datasets/en/image_dataset. Next you can do dataset.push_to_hub("your-hf-username/bean-disease-uganda") to make it available for everyone.
— Reply to this email directly, view it on GitHub https://github.com/Project-AgML/AgML/issues/73#issuecomment-2908906478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDDLSG62TZ7QCRC7PWETQD3ALDAHAVCNFSM6AAAAABZ4NZ2GOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMBYHEYDMNBXHA . You are receiving this because you commented.Message ID: @.***>
Re-reading what Amogh said, would you like me to create the PR directly on the Project-AgML repo?
On Sat, Jun 28, 2025 at 9:38 AM Darcie Gurley @.***> wrote:
Hi, folks--
Sorry for the delay--the dataset is now available at darcieg/bean-disease-uganda · Datasets at Hugging Face https://huggingface.co/datasets/darcieg/bean-disease-uganda.
I also created a PR to replace the existing link, which led to a page with no dataset, with my link. The PR is here: Replace placeholder dataset link with live Hugging Face version by Darcieg · Pull Request #6 · AI-Lab-Makerere/ibean https://github.com/AI-Lab-Makerere/ibean/pull/6
Let me know if everything looks good!
Thanks. -Darcie
On Tue, May 27, 2025 at 11:47 AM Darcie Gurley @.***> wrote:
Awesome--thank you!
On Mon, May 26, 2025 at 1:02 AM NielsRogge @.***> wrote:
NielsRogge left a comment (Project-AgML/AgML#73) https://github.com/Project-AgML/AgML/issues/73#issuecomment-2908906478
Great! One thing that could be done is take any image classification dataset from the Github README and make it available on the hub. For example this one: https://github.com/Project-AgML/AgML/blob/main/docs/datasets/bean_disease_uganda.md, could easily be turned into a HF dataset by following this guide: https://huggingface.co/docs/datasets/en/image_dataset. Next you can do dataset.push_to_hub("your-hf-username/bean-disease-uganda") to make it available for everyone.
— Reply to this email directly, view it on GitHub https://github.com/Project-AgML/AgML/issues/73#issuecomment-2908906478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDDLSG62TZ7QCRC7PWETQD3ALDAHAVCNFSM6AAAAABZ4NZ2GOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMBYHEYDMNBXHA . You are receiving this because you commented.Message ID: @.***>