mteb icon indicating copy to clipboard operation
mteb copied to clipboard

Add Indic xnli pair classification

Open SaitejaUtpala opened this issue 10 months ago • 2 comments

Checklist for adding MMTEB dataset

Reason for dataset addition:

  • [ ] I have tested that the dataset runs with the mteb package.
  • [ ] I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
    • [ ] sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • [ ] intfloat/multilingual-e5-small
  • [ ] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • [ ] If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • [ ] I have filled out the metadata object in the dataset file (find documentation on it here).
  • [ ] Run tests locally to make sure nothing is broken using make test.
  • [ ] Run the formatter to format the code using make lint.
  • [ ] I have added points for my submission to the points folder using the PR number as the filename (e.g. 438.jsonl).

SaitejaUtpala avatar Apr 26 '24 13:04 SaitejaUtpala

waiting until #582

SaitejaUtpala avatar Apr 26 '24 13:04 SaitejaUtpala

waiting until #582

This is not really an issue, @loicmagne already answered, you need to adjust your dataset to meet the expected format by the PC task. Please check existing examples fo PC tasks with data_transform() function that adjusts datasets to the expected format.

We'll think about updating this format but for now it's not the priority. Note that this update would require to update all mteb/*PC datasets.

imenelydiaker avatar Apr 29 '24 07:04 imenelydiaker

From the checklist it seems like this is still a work in progress. Will close it for now, but feel free to re-open it.

KennethEnevoldsen avatar May 21 '24 09:05 KennethEnevoldsen