mteb Adding FarsTail (Persian) for Pair Classification

Adding FarsTail (Persian) for Pair Classification

Open wissam-sib opened this issue 9 months ago • 1 comments

Still missing some languages in Pair Classification so adding this Persian dataset.

[x] I have tested that the dataset runs with the mteb package.
[x] I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- [x] sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- [x] intfloat/multilingual-e5-small
[x] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
[x] I have filled out the metadata object in the dataset file (find documentation on it here).
[x] Run tests locally to make sure nothing is broken using make test.
[x] Run the formatter to format the code using make lint.
[ ] I have added points for my submission to the points folder using the PR number as the filename (e.g. 438.jsonl).

May 15 '24 21:05 wissam-sib

Code changes LGTM. I wonder if the similar scores by both models is a potential issue?

I tested with a Persian embedding model and it made a difference. However, I decided to switch to another label available in the dataset (entailment/contradiction) which leads to larger variability.

May 16 '24 05:05 wissam-sib

mteb mteb copied to clipboard

Adding FarsTail (Persian) for Pair Classification

mteb
mteb copied to clipboard