Improve `Create a dataset` tutorial
Our tutorial on how to create a dataset is a bit misleading.
- In Folder-based builders section it says that we have two folder-based builders as standard builders, but we also have similar builders (that can be created from directory with data of required format) for
csv,json/jsonl,parquetandtxtfiles. We have info about these loaders in separate guide for loading but it's worth briefly mentioning them in the beginning tutorial because they are more common and for consistency. Would be helpful to add the link to the full guide. - From local files section lists methods for creating a dataset from in-memory data which are also described in loading guide.
Maybe we should actually rethink and restructure this tutorial somehow.
I can work on this. The link to the tutorial seems to be broken though @polinaeterna.
@isunitha98selvan would be great, thank you! which link are you talking about? I think it should work: https://huggingface.co/docs/datasets/create_dataset
Hey I don't mind working on this issue. From my understanding, we want to let the reader know that they can build datasets from csv, json/jsonl, parquet and txt files in the folder-based builders section and include a link to the full guide. Then in the from local files section, we just want to list the methods from in-memory data section such as .from_dict().
Hey @polinaeterna, I have a pull request for this issue. Can you review and see if it needs any changes?