mteb icon indicating copy to clipboard operation
mteb copied to clipboard

fix: Convert Reddit cluster s2s and p2p to fast

Open isaac-chung opened this issue 1 month ago • 8 comments

Checklist for adding MMTEB dataset

Resolve https://github.com/embeddings-benchmark/mteb/issues/728

  • [x] I have tested that the dataset runs with the mteb package.
  • [x] I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
    • [x] sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • [x] intfloat/multilingual-e5-small
  • [x] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • [x] If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • [x] I have filled out the metadata object in the dataset file (find documentation on it here).
  • [x] Run tests locally to make sure nothing is broken using make test.
  • [x] Run the formatter to format the code using make lint.
  • [x] I have added points for my submission to the points folder using the PR number as the filename (e.g. 438.jsonl).

isaac-chung avatar May 15 '24 12:05 isaac-chung