GerAlpacaDataCleaned icon indicating copy to clipboard operation
GerAlpacaDataCleaned copied to clipboard

notebook: add argilla dataset creation (and uploading everything)

Open stefan-it opened this issue 2 years ago • 3 comments

Hi,

this notebooks performs the following operations:

  • Use the Translated dataset and enrich is with metadata information (e.g. translation model, original id and even sentence embeddings
  • Create a Hugging Face Dataset and upload it to the hub
  • Create an Argilla dataset
  • Upload the created Argilla dataset to our Hugging Face Space demo

stefan-it avatar Mar 31 '23 19:03 stefan-it

Is it an Argilla requirement that the metadata column must contain dicts? What is the purpose of the metadata column and why don't we split the info in the dicts into multiple columns?

PhilipMay avatar Mar 31 '23 21:03 PhilipMay

I do not understand the column renaming. Why the underscore and why 2 renames that do not change anything?

PhilipMay avatar Mar 31 '23 22:03 PhilipMay

Is it an Argilla requirement that the metadata column must contain dicts? What is the purpose of the metadata column and why don't we split the info in the dicts into multiple columns?

The underscore for the instruction field is due to our current limitation for field ordering. We need this to make the instruction field shown at the top of the record. We plan to fix this soon. The other renames are indeed not needed and was old code I provided to @stefan-it

dvsrepo avatar Apr 03 '23 10:04 dvsrepo