datasets icon indicating copy to clipboard operation
datasets copied to clipboard

[WIP] Docs for creating a loading script for image datasets

Open stevhliu opened this issue 2 years ago • 1 comments

This PR is a first draft of creating a loading script for image datasets. Feel free to let me know if there are any specificities I'm missing for this. 🙂

To do:

  • [x] Document how to create different configurations.

stevhliu avatar Aug 02 '22 20:08 stevhliu

The documentation is not available anymore as the PR was closed or merged.

IMO it would make more sense to add a "Create image dataset" page with two main sections - a no-code approach with imagefolder + metadata (preferred way), and with a loading script (advanced). It should be clear when to choose which. If we leave this as-is, the user who jumps straight to the Vision section could be under the impression that writing a loading script is the preferred way to share a vision dataset due to how this subsection starts:

Write a dataset loading script to share a dataset.

Also, I think a note explaining how to make a dataset gated/disable the viewer to hide the data would be beneficial (it's pretty common to require submitting a form to access a CV dataset).

mariosasko avatar Aug 19 '22 17:08 mariosasko

Great suggestion @mariosasko! I added your suggestions, let me know what you think. For gated dataset access, I just added a tip referring users to the relevant docs since it's more of a Hub feature than datasets feature.

stevhliu avatar Aug 20 '22 00:08 stevhliu

Thanks, looks much better now :). I would also move the sections explaining how to create an imagefolder for the specific task from the loading page to this one. IMO it makes more sense to have the basic info (imagefolder structure + load_dataset call) there + a link to this page for info on how to create an image folder dataset.

mariosasko avatar Aug 29 '22 11:08 mariosasko

Good idea! Moved everything about imagefolder + metadata to the create an image dataset section since the load_dataset call is the same for different computer vision tasks.

stevhliu avatar Aug 29 '22 17:08 stevhliu

Thanks for all the feedbacks! 🥰

What do you think about creating how to share an ImageFolder dataset in a separate PR? I think we should create a new section under Vision for how to share an image dataset.

stevhliu avatar Sep 06 '22 17:09 stevhliu

I love it thanks ! I think moving forward we can use CSV instead of JSON Lines in the docs ;)

lhoestq avatar Sep 09 '22 17:09 lhoestq