André Pankraz comments

Results 11 comments of


                                            André Pankraz

BGE-M3 Sparse

Hi, I just call your methods without much fluff: model = BGEM3FlagModel( "BAAI/bge-m3", use_fp16=True ) passages_outputs = model.model( passages_inputs, return_dense=False, return_sparse=True, return_colbert=False, return_sparse_embedding=True ) I just follow compute_score here https://github.com/FlagOpen/FlagEmbedding/blob/11dc092e39ed0ff6e715866b2bdaca0cc775a296/FlagEmbedding/bge_m3.py#L188...

BGE-M3 Sparse

Thank you all, I will try. In this case you should adapt your example https://huggingface.co/BAAI/bge-m3 "Compute score for text pairs" which uses method model.compute_score() which in turn uses sparse embeddings?...

Batch Sizes not used anywhere? Out of mem...

couldn't reproduce either, closing. thx

Support for ELECTRA-based Re-Ranking Cross-encoder

i think this can be closed. the world moved on. the BAAI/bge-reranker-v2-m3 works just fine for multi language and is supported

[Bug]: Docker vLLM 0.9.1 CUDA error: an illegal memory access, sampled_token_ids.tolist()

While we're glad the issue is resolved with the suggested setting, we don’t consider the matter fully closed. In our view, it’s problematic that the default Docker container doesn’t work...

[Bug]: Docker vLLM 0.9.1 CUDA error: an illegal memory access, sampled_token_ids.tolist()

Thanks for the follow-up. We're more on the "average Joe end-user" side of the VLLM ecosystem — we usually rely on the official Docker images rather than building from source...

[Bug]: Docker vLLM 0.9.1 CUDA error: an illegal memory access, sampled_token_ids.tolist()

Sry I have no benchmark to trigger this problem reliably - this issue happens in real live load scenarios like previous commenter said. What I can say: For us VLLM_USE_FLASHINFER_SAMPLER=0...

[Bug]: Docker vLLM 0.9.1 CUDA error: an illegal memory access, sampled_token_ids.tolist()

We had it with both: NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 NVIDIA-SMI 550.163.01 Driver Version: 550.163.01 CUDA Version: 12.4 We have the problem with Qweb2.5-72B on 4 x...

[Bug]: Docker vLLM 0.9.1 CUDA error: an illegal memory access, sampled_token_ids.tolist()

@hnt2601 No - and we will not patch into the code or whatever and just wait for a working Docker container. For us this worked to stabilize the container: VLLM_USE_FLASHINFER_SAMPLER:...

Playground Template Recognitiion doesn't work anymore in current version 1.0.300

oh...i didn't notice this intentional change, sry. in development mode we just get the updates and don't fixate the version. i thought it was some problem. hmm, i think templates...