datasets
datasets copied to clipboard
[WIP] Docs for creating a loading script for image datasets
This PR is a first draft of creating a loading script for image datasets. Feel free to let me know if there are any specificities I'm missing for this. 🙂
To do:
- [x] Document how to create different configurations.
The documentation is not available anymore as the PR was closed or merged.
IMO it would make more sense to add a "Create image dataset" page with two main sections - a no-code approach with imagefolder
+ metadata (preferred way), and with a loading script (advanced). It should be clear when to choose which. If we leave this as-is, the user who jumps straight to the Vision section could be under the impression that writing a loading script is the preferred way to share a vision dataset due to how this subsection starts:
Write a dataset loading script to share a dataset.
Also, I think a note explaining how to make a dataset gated/disable the viewer to hide the data would be beneficial (it's pretty common to require submitting a form to access a CV dataset).
Great suggestion @mariosasko! I added your suggestions, let me know what you think. For gated dataset access, I just added a tip referring users to the relevant docs since it's more of a Hub feature than datasets
feature.
Thanks, looks much better now :). I would also move the sections explaining how to create an imagefolder
for the specific task from the loading page to this one. IMO it makes more sense to have the basic info (imagefolder structure + load_dataset
call) there + a link to this page for info on how to create an image folder dataset.
Good idea! Moved everything about imagefolder
+ metadata to the create an image dataset section since the load_dataset
call is the same for different computer vision tasks.
Thanks for all the feedbacks! 🥰
What do you think about creating how to share an ImageFolder
dataset in a separate PR? I think we should create a new section under Vision
for how to share an image dataset.
I love it thanks ! I think moving forward we can use CSV instead of JSON Lines in the docs ;)