mteb
mteb copied to clipboard
Add turkic datasets
Checklist for adding MMTEB dataset
Reason for dataset addition:
- [x] I have tested that the dataset runs with the
mtebpackage. - [x] I have run the following models on the task (adding the results to the pr). These can be run using the
mteb run -m {model_name} -t {task_name}command.- [x]
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 - [x]
intfloat/multilingual-e5-small
- [x]
- [x] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
- [x] If the dataset is too big (e.g. >2048 examples), considering using
self.stratified_subsampling() under dataset_transform() - [x] I have filled out the metadata object in the dataset file (find documentation on it here).
- [x] Run tests locally to make sure nothing is broken using
make test. - [x] Run the formatter to format the code using
make lint. - [ ] I have added points for my submission to the points folder using the PR number as the filename (e.g.
438.jsonl).
This is still in progress, tag me when you finish it
This is still in progress, tag me when you finish it
@asparius this is ready for review. Thanks!
you didn't push results for intfloat__multilingual-e5-small
@shreeya-dhakal seems like everything is good here. I will set it to merge. Feel free to submit a PR with the points.