FastChat
FastChat copied to clipboard
Language distribution of ShareGPT 70K conversation dataset for FastChat T5
What are all the languages present in the ShareGPT 70,000 conversation dataset which was used to fine-tune FastChat-T5?
The ReadMe file points to data_cleaning.md which was used to get data from ShareGPT. Within data_cleaning.md seems like sharegpt_clean_lang.json contains the list of languages in consideration and some languages are skipped.
how can i finetune with bounds of datasets?
What are all the languages present in the ShareGPT 70,000 conversation dataset which was used to fine-tune FastChat-T5?
The ReadMe file points to
data_cleaning.mdwhich was used to get data from ShareGPT. Withindata_cleaning.mdseems likesharegpt_clean_lang.jsoncontains the list of languages in consideration and some languages are skipped.
Hi I have the same question about the language distribution, do you have any idea?