Sebastian.W
Sebastian.W
@ggerganov I am running bge-rerank-v2-m3 model with the llama.cpp server b4641. The rerank api is working, but the score seems weird. it always returns "1" for the most matching one,...
I copied your settings, and tested the API with below request: ```json { "model": "bge-reranker", "query": "A man is eating pasta.", "documents": [ "A man is eating food.", "A man...
@foldl thanks for your advice. But I program in Java only... Could you kind help to create a PR?
@foldl Thanks, but I meant could you implement the feature for llama.cpp server? Just like @ggerganov mentioned, maybe a new argument for starting llama.cpp server to enable the feature. I...
> > I copied your settings, and tested the API with below request: > > { > > "model": "bge-reranker", > > "query": "A man is eating pasta.", > >...
> Consider --tensor-parallel-size 4 or --tensor-parallel-size 8 --enable-expert-parallel. I am running `Qwen3-30B-A3B-FP8` with two A10 GPUs. `tp=2` is enough to load the model, does vllm support "tp=2" in this case?
I am wondering is this feature in plan?
I turns out it's my fault that I forgot to add user input in the chat flow. It's not an issue of Dify.
Yes, the new version still not supporting multipart/form-data content type.
I encountered the same error. Service resumed by clear application cache.