mteb SentenceTransformer hangs

Packages:

!pip install -q git+https://github.com/UKPLab/sentence-transformers.git
!pip install -q git+https://github.com/embeddings-benchmark/mteb.git
!pip install -q git+https://github.com/NouamaneTazi/beir.git@fix_drpes_ids
!pip install -q evaluate

Doing

import time
from mteb import MTEB
from sentence_transformers import SentenceTransformer

class SentenceTransformerX(SentenceTransformer):
  pass

model_name = "sentence-transformers/average_word_embeddings_komninos"


model = SentenceTransformerX(model_name)
evaluation = MTEB(tasks=["SciFact"])
a = time.time()
results = evaluation.run(model, output_folder=f"results/{model_name}", overwrite_results=True)
b = time.time()

hangs at

 p = ctx.Process(
                target=SentenceTransformer._encode_multi_process_worker,
                args=(process_id, device_name, self.model, input_queue, output_queue),
                daemon=True,
            )

I think you're the expert here - any ideas? @NouamaneTazi

This only affects the latest BEIR, i.e. I think it has something to do with DPRES. Using the below is fine

!pip install -q git+https://github.com/UKPLab/sentence-transformers.git
!pip install -q git+https://github.com/embeddings-benchmark/mteb.git
!pip install beir==1.0.0

Sep 12 '22 07:09 Muennighoff

Can you retry with this script?

import time
from mteb import MTEB
from sentence_transformers import SentenceTransformer

class SentenceTransformerX(SentenceTransformer):
  pass

model_name = "sentence-transformers/average_word_embeddings_komninos"

if __name__ == '__main__':  
    model = SentenceTransformerX(model_name)
    evaluation = MTEB(tasks=["SciFact"])
    a = time.time()
    results = evaluation.run(model, output_folder=f"results/{model_name}", overwrite_results=True)
    b = time.time()

Because you should have gotten this error when running your script. Which just means that you need to put your starting code under __main__

Error while evaluating SciFact: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Sep 12 '22 09:09 NouamaneTazi

Doesn't solve it in a notebook for me, see here: https://colab.research.google.com/drive/10LcYa6-G_NOw8aLaq0ezU2g2FtHZgviO?usp=sharing

Sep 12 '22 10:09 Muennighoff

It seems that the script must be run independently for the multiprocessing to work (not from inside an interactive session for example). So this is how I made it work:

create a new file called scifact.py with the content of your script
run %run -i 'scifact.py' from colab's notebook

Sep 13 '22 07:09 NouamaneTazi

It seems that the script must be run independently for the multiprocessing to work (not from inside an interactive session for example). So this is how I made it work:

create a new file called scifact.py with the content of your script

run %run -i 'scifact.py' from colab's notebook

Nice, do you know why it works with the normal SentenceTransformer class though? It's odd that it only does not work when subclassing it

Sep 13 '22 07:09 Muennighoff

Oh wait you're right! Nice catch Then nevermind my previous solution, you can just do this

import time
from mteb import MTEB
from sentence_transformers import SentenceTransformer

if __name__ == '__main__':  
    class SentenceTransformerX(SentenceTransformer):
      pass
    model_name = "sentence-transformers/average_word_embeddings_komninos"
    
    model = SentenceTransformerX(model_name)
    evaluation = MTEB(tasks=["SciFact"])
    a = time.time()
    results = evaluation.run(model, output_folder=f"results/{model_name}", overwrite_results=True)
    b = time.time()

Sep 13 '22 07:09 NouamaneTazi

Oh wait you're right! Nice catch Then nevermind my previous solution, you can just do this

import time
from mteb import MTEB
from sentence_transformers import SentenceTransformer

if __name__ == '__main__':  
    class SentenceTransformerX(SentenceTransformer):
      pass
    model_name = "sentence-transformers/average_word_embeddings_komninos"
    
    model = SentenceTransformerX(model_name)
    evaluation = MTEB(tasks=["SciFact"])
    a = time.time()
    results = evaluation.run(model, output_folder=f"results/{model_name}", overwrite_results=True)
    b = time.time()

I tried it in the same notebook and it also hangs I think?

Sep 13 '22 08:09 Muennighoff

Yeah I just restarted my session and ran the same cell and now it hangs 🤔 No idea what's happening there, but I'm glad at least the script solution works

Sep 13 '22 08:09 NouamaneTazi

Should we close this?

Sep 18 '22 16:09 NouamaneTazi

Should we close this?

Hmm it's not really solved, is it? I'm not sure whether it's a MTEB / BeIR / SentenceTransformers or Python limitation?

Sep 18 '22 17:09 Muennighoff

can confirm pip install beir==1.0.0 fixes this for me too. can we add this as a required dependency?

Nov 08 '22 20:11 Rabrg

beir==1.0.0

Yeah this PR would fix it: https://github.com/embeddings-benchmark/mteb/pull/86 by not using the parallel processor added to BEIR in 1.0.1. The only disadvantage is it will remove the possibility to run on multiple GPUs (It will just use 1 GPU).

Nov 08 '22 21:11 Muennighoff

This issue seems to be resolved (especially as we no longer rely on BEIR)

Jun 05 '24 18:06 KennethEnevoldsen