SentenceTransformer hangs
Packages:
!pip install -q git+https://github.com/UKPLab/sentence-transformers.git
!pip install -q git+https://github.com/embeddings-benchmark/mteb.git
!pip install -q git+https://github.com/NouamaneTazi/beir.git@fix_drpes_ids
!pip install -q evaluate
Doing
import time
from mteb import MTEB
from sentence_transformers import SentenceTransformer
class SentenceTransformerX(SentenceTransformer):
pass
model_name = "sentence-transformers/average_word_embeddings_komninos"
model = SentenceTransformerX(model_name)
evaluation = MTEB(tasks=["SciFact"])
a = time.time()
results = evaluation.run(model, output_folder=f"results/{model_name}", overwrite_results=True)
b = time.time()
hangs at
p = ctx.Process(
target=SentenceTransformer._encode_multi_process_worker,
args=(process_id, device_name, self.model, input_queue, output_queue),
daemon=True,
)
I think you're the expert here - any ideas? @NouamaneTazi
This only affects the latest BEIR, i.e. I think it has something to do with DPRES. Using the below is fine
!pip install -q git+https://github.com/UKPLab/sentence-transformers.git
!pip install -q git+https://github.com/embeddings-benchmark/mteb.git
!pip install beir==1.0.0
Can you retry with this script?
import time
from mteb import MTEB
from sentence_transformers import SentenceTransformer
class SentenceTransformerX(SentenceTransformer):
pass
model_name = "sentence-transformers/average_word_embeddings_komninos"
if __name__ == '__main__':
model = SentenceTransformerX(model_name)
evaluation = MTEB(tasks=["SciFact"])
a = time.time()
results = evaluation.run(model, output_folder=f"results/{model_name}", overwrite_results=True)
b = time.time()
Because you should have gotten this error when running your script. Which just means that you need to put your starting code under __main__
Error while evaluating SciFact:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Doesn't solve it in a notebook for me, see here: https://colab.research.google.com/drive/10LcYa6-G_NOw8aLaq0ezU2g2FtHZgviO?usp=sharing
It seems that the script must be run independently for the multiprocessing to work (not from inside an interactive session for example). So this is how I made it work:
- create a new file called
scifact.pywith the content of your script - run
%run -i 'scifact.py'from colab's notebook
It seems that the script must be run independently for the multiprocessing to work (not from inside an interactive session for example). So this is how I made it work:
- create a new file called
scifact.pywith the content of your script- run
%run -i 'scifact.py'from colab's notebook
Nice, do you know why it works with the normal SentenceTransformer class though? It's odd that it only does not work when subclassing it
Oh wait you're right! Nice catch Then nevermind my previous solution, you can just do this
import time
from mteb import MTEB
from sentence_transformers import SentenceTransformer
if __name__ == '__main__':
class SentenceTransformerX(SentenceTransformer):
pass
model_name = "sentence-transformers/average_word_embeddings_komninos"
model = SentenceTransformerX(model_name)
evaluation = MTEB(tasks=["SciFact"])
a = time.time()
results = evaluation.run(model, output_folder=f"results/{model_name}", overwrite_results=True)
b = time.time()
Oh wait you're right! Nice catch Then nevermind my previous solution, you can just do this
import time from mteb import MTEB from sentence_transformers import SentenceTransformer if __name__ == '__main__': class SentenceTransformerX(SentenceTransformer): pass model_name = "sentence-transformers/average_word_embeddings_komninos" model = SentenceTransformerX(model_name) evaluation = MTEB(tasks=["SciFact"]) a = time.time() results = evaluation.run(model, output_folder=f"results/{model_name}", overwrite_results=True) b = time.time()
I tried it in the same notebook and it also hangs I think?
Yeah I just restarted my session and ran the same cell and now it hangs 🤔 No idea what's happening there, but I'm glad at least the script solution works
Should we close this?
Should we close this?
Hmm it's not really solved, is it? I'm not sure whether it's a MTEB / BeIR / SentenceTransformers or Python limitation?
can confirm pip install beir==1.0.0 fixes this for me too. can we add this as a required dependency?
beir==1.0.0
Yeah this PR would fix it: https://github.com/embeddings-benchmark/mteb/pull/86 by not using the parallel processor added to BEIR in 1.0.1. The only disadvantage is it will remove the possibility to run on multiple GPUs (It will just use 1 GPU).
This issue seems to be resolved (especially as we no longer rely on BEIR)