hahmad2008 comments

Results 43 comments of


                                            hahmad2008

trafficstars

bug: Error in sending post request for bentoml container service

@aarnphm @bojiang could you please check?

Can't pass workers_per_resource to the bentoml container

@aarnphm What is the difference between the previous two cases, so the first case can launch two processes one for ray worker and other for bentoml service (that when using...

Can't finetune Mixtral model

@winglian , I used FSDP with qlora and the model still loaded as copied to the GPUs. I tried it with passing accelerate config and without and having the same...

Can't finetune Mixtral model

seems I need to enable this `fsdp_offload_params: true`

[Bug] OOM for concurrent long requests

@merrymercy thanks for the prompt response. it works with `--max-prefill 4096`. btw is the backend VLLM? what are the available backends? for tokenizer, how it should be if i didn't...

[Bug] OOM for concurrent long requests

btw is there any factors that influence the concurrent requests should I check?

[Bug]: VLLM 0.8.2 OOM error (No error in 0.7.3 version)

I have the same issue, with V0 I can serve mistral3.1-awq with 4k context length on 24G GPU but I have OOM if I use V1. [check here.](https://github.com/vllm-project/vllm/issues/16128#issuecomment-2782811982)

[Bug]: VLLM 0.8.2 OOM error (No error in 0.7.3 version)

@paolovic @hmellor @DarkLight1337 could you please check this ticket related to vllm version 0.8.3? https://github.com/vllm-project/vllm/issues/16552

[Bug]: Deepseek resoning content is coming as null and the think content is going inside content when using vllm-openai v0.7.2 docker containers

@majestichou @Nietism I have the same issue, model: `"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` my chat template as following: ``` "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{%...

[Bug]: Deepseek resoning content is coming as null and the think content is going inside content when using vllm-openai v0.7.2 docker containers

@Nietism the `reasoning_context` is null plus sometime the first tag in the content is missing. How did you solve it?