Open-Assistant
Open-Assistant copied to clipboard
Add UA-SQuAD dataset for Ukrainian language
This is a Ukrainian version of Stanford Question Answering Dataset (SQuAD). It is a QA dataset with MIT license. While the page says it's a WIP, the dataset still has a good amount of data. I will try reaching out to them to find out if they have a more complete version.
Github link: https://github.com/fido-ai/ua-datasets/tree/main/ua_datasets/src/question_answering
The repo also contains Text Classification and Token Classification datasets, but I'm not sure if they are useful for OA. https://github.com/fido-ai/ua-datasets
Info about dataset:
Number of samples: 13 859 Number of questions without answers: 2 927 File size: 17.1 MB
Link to huggingface dataset: https://huggingface.co/datasets/FIdo-AI/ua-squad
than you.
@ontocord
Should I proceed to add it here? https://github.com/LAION-AI/Open-Assistant/blob/main/model/model_training/custom_datasets/qa_datasets.py