mteb icon indicating copy to clipboard operation
mteb copied to clipboard

ReRanking Arabic Medical QA

Open Akash190104 opened this issue 10 months ago • 5 comments

ReRanking Arabic Medical QA.

Checklist for adding MMTEB dataset

Reason for dataset addition:

  • [x] I have tested that the dataset runs with the mteb package.
  • [x] I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
    • [x] sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    • [ ] intfloat/multilingual-e5-small
  • [x] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
  • [x] If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
  • [ ] I have filled out the metadata object in the dataset file (find documentation on it here).
  • [x] Run tests locally to make sure nothing is broken using make test.
  • [x] Run the formatter to format the code using make lint.
  • [ ] I have added points for my submission to the points folder using the PR number as the filename (e.g. 438.jsonl).

Akash190104 avatar Apr 24 '24 15:04 Akash190104

I need some help with this. The models require a lot of time to run which is why I did not intfloat/multilingual-e5-small @KennethEnevoldsen since, I already made a mistake of assuming that I had added a dataset in a language for the first time for a task (when it actually existed), I wanted to make sure if this is actually the first arabic dataset for the reranking task as it looks like that to me.

Akash190104 avatar Apr 24 '24 15:04 Akash190104

@Akash190104 Was it on cpu or gpu? If cpu you may try colab like alternatives

asparius avatar Apr 24 '24 16:04 asparius

@Akash190104 Was it on cpu or gpu? If cpu you may try colab like alternatives

It was on cpu. I would try it out on gpu thanks.

Akash190104 avatar Apr 24 '24 16:04 Akash190104

I tried running the intfloat/multilingual-e5-small but the run got terminated. Any reason why that might be the case?

Akash190104 avatar Apr 24 '24 16:04 Akash190104

I tried running the intfloat/multilingual-e5-small but the run got terminated. Any reason why that might be the case?

Maybe a problem with your hardware? Can you copy paste the error please?

imenelydiaker avatar Apr 24 '24 17:04 imenelydiaker

@Akash190104 any news on this PR?

imenelydiaker avatar May 07 '24 13:05 imenelydiaker

Yeah, I am sorry for not closing the PR. I tried contacting the author of the huggingface dataset and this is what they said:

"Hello it’s part of new paper that we are working on so I would prefer if it’s not added to mteb just yet we need to go through the release process first"

The paper would be out in two weeks and I could reopen the PR then. We would have the necessary citations and metadata information then as well.

Akash190104 avatar May 07 '24 13:05 Akash190104