Hu Dong

Results 17 comments of Hu Dong

Just saw a related pending PR: https://github.com/Azure/mmlspark/pull/912

Could we get this PR merged? I ran into the bug recently. It causes qpid proton to be pretty unusable.

Might be related to https://github.com/vllm-project/vllm/issues/3839 https://github.com/vllm-project/vllm/issues/4135 https://github.com/vllm-project/vllm/issues/4293 https://github.com/vllm-project/vllm/issues/6254 (which is fixed by https://github.com/vllm-project/vllm/pull/6255)

> How easy is it to reproduce the issue? It's about 1/10 I think. It seemed to be very random, at least not directly caused by request concurrency, nor prompt...

> Also, Is it possible to reproduce it with CUDA_LAUNCH_BLOCKING=1 and show us the line? We just tried. Here's the stacktrace with the env variable ``` ERROR 04-30 11:35:13 async_llm_engine.py:499]...

FYI, we actually deployed several instances. They're running on different envs. The following instances have been running for more than 5 days without any problem: 1. vLLM 0.4.0post1, tp=4 (70B...

> Can you also share the stacktrace of workers that are not stuck? (or is all workers stuck at the same line?) Not sure whether the following is what we...

> Also, is there code I can try reproducing it in our env? We're were sending requests directly to the vllm container using `curl`, without any in-house code. The container...

> Hmm it is actually interesting PID 7065 is running nothing. It might be the root cause of hanging. Since around that logit access code, all the workers need to...

> also one interesting thing is you use `--enable-prefix-caching `. Does it still hang without this flag? (can you just check)? I can try reproducing it on my end in...