André Pankraz
André Pankraz
Hi, I just call your methods without much fluff: model = BGEM3FlagModel( "BAAI/bge-m3", use_fp16=True ) passages_outputs = model.model( passages_inputs, return_dense=False, return_sparse=True, return_colbert=False, return_sparse_embedding=True ) I just follow compute_score here https://github.com/FlagOpen/FlagEmbedding/blob/11dc092e39ed0ff6e715866b2bdaca0cc775a296/FlagEmbedding/bge_m3.py#L188...
Thank you all, I will try. In this case you should adapt your example https://huggingface.co/BAAI/bge-m3 "Compute score for text pairs" which uses method model.compute_score() which in turn uses sparse embeddings?...
couldn't reproduce either, closing. thx
i think this can be closed. the world moved on. the BAAI/bge-reranker-v2-m3 works just fine for multi language and is supported
While we're glad the issue is resolved with the suggested setting, we don’t consider the matter fully closed. In our view, it’s problematic that the default Docker container doesn’t work...
Thanks for the follow-up. We're more on the "average Joe end-user" side of the VLLM ecosystem — we usually rely on the official Docker images rather than building from source...
Sry I have no benchmark to trigger this problem reliably - this issue happens in real live load scenarios like previous commenter said. What I can say: For us VLLM_USE_FLASHINFER_SAMPLER=0...
We had it with both: NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 NVIDIA-SMI 550.163.01 Driver Version: 550.163.01 CUDA Version: 12.4 We have the problem with Qweb2.5-72B on 4 x...
@hnt2601 No - and we will not patch into the code or whatever and just wait for a working Docker container. For us this worked to stabilize the container: VLLM_USE_FLASHINFER_SAMPLER:...
oh...i didn't notice this intentional change, sry. in development mode we just get the updates and don't fixate the version. i thought it was some problem. hmm, i think templates...