Hu Dong
Hu Dong
Just saw a related pending PR: https://github.com/Azure/mmlspark/pull/912
Could we get this PR merged? I ran into the bug recently. It causes qpid proton to be pretty unusable.
Might be related to https://github.com/vllm-project/vllm/issues/3839 https://github.com/vllm-project/vllm/issues/4135 https://github.com/vllm-project/vllm/issues/4293 https://github.com/vllm-project/vllm/issues/6254 (which is fixed by https://github.com/vllm-project/vllm/pull/6255)
> How easy is it to reproduce the issue? It's about 1/10 I think. It seemed to be very random, at least not directly caused by request concurrency, nor prompt...
> Also, Is it possible to reproduce it with CUDA_LAUNCH_BLOCKING=1 and show us the line? We just tried. Here's the stacktrace with the env variable ``` ERROR 04-30 11:35:13 async_llm_engine.py:499]...
FYI, we actually deployed several instances. They're running on different envs. The following instances have been running for more than 5 days without any problem: 1. vLLM 0.4.0post1, tp=4 (70B...
> Can you also share the stacktrace of workers that are not stuck? (or is all workers stuck at the same line?) Not sure whether the following is what we...
> Also, is there code I can try reproducing it in our env? We're were sending requests directly to the vllm container using `curl`, without any in-house code. The container...
> Hmm it is actually interesting PID 7065 is running nothing. It might be the root cause of hanging. Since around that logit access code, all the workers need to...
> also one interesting thing is you use `--enable-prefix-caching `. Does it still hang without this flag? (can you just check)? I can try reproducing it on my end in...