GerAlpacaDataCleaned
GerAlpacaDataCleaned copied to clipboard
notebook: add argilla dataset creation (and uploading everything)
Hi,
this notebooks performs the following operations:
- Use the Translated dataset and enrich is with metadata information (e.g. translation model, original id and even sentence embeddings
- Create a Hugging Face Dataset and upload it to the hub
- Create an Argilla dataset
- Upload the created Argilla dataset to our Hugging Face Space demo
Is it an Argilla requirement that the metadata column must contain dicts? What is the purpose of the metadata column and why don't we split the info in the dicts into multiple columns?
I do not understand the column renaming. Why the underscore and why 2 renames that do not change anything?
Is it an Argilla requirement that the metadata column must contain dicts? What is the purpose of the metadata column and why don't we split the info in the dicts into multiple columns?
The underscore for the instruction field is due to our current limitation for field ordering. We need this to make the instruction field shown at the top of the record. We plan to fix this soon. The other renames are indeed not needed and was old code I provided to @stefan-it