FlagEmbedding
FlagEmbedding copied to clipboard
which way is the fastest to infer a LLM-based embedder?
Hi, I was looking at the BAAI/bge-multilingual-gemma2 model.
Then I use GPU for inference via transformers, I found it very slow. Takes about several seconds to encode one sentence. Is it normal? Normally, how long does it take to get an embedding?
I noticed that FlagEmbedding and sentence_transformer are also available for inference. Which way is the fastest?
Is vLLM gonna help?