What Does This PR Do?

FIX #7454 #6818 #6614

I am using vllm and Qwen2-72B-Instruct model to do performance test of speculative decode-ngram algorithm. When I set: speculative_model="[ngram]", num_speculative_tokens=5, ngram_prompt_lookup_max=3, ngram_prompt_lookup_min=1, max_tokens=10240, max_num_seqs=8, max_model_len = 20480, max_num_batched_tokens = 32768, num_scheduler_steps=1, a known bug will appear in the process of running vllm. As the issues I mentioned above, my error is the same as theirs.

I traced the code and found that the original error was generated in https://github.com/vllm-project/vllm/blob/da1a844e61366b473cef6b3f7437ea5dc41876a1/vllm/distributed/device_communicators/shm_broadcast.py#L387,

The calling relationship is as follows ：https://github.com/vllm-project/vllm/blob/da1a844e61366b473cef6b3f7437ea5dc41876a1/vllm/distributed/device_communicators/shm_broadcast.py#L438,

https://github.com/vllm-project/vllm/blob/da1a844e61366b473cef6b3f7437ea5dc41876a1/vllm/distributed/parallel_state.py#L181

https://github.com/vllm-project/vllm/blob/main/vllm/distributed/parallel_state.py#L934

So the trigger of this bug is that use_message_queue_broadcaster=True is set, set it to False, it can run normally. This bug needs further location and confirmation by the vllm team.

Sep 10 '24 08:09 xq25478

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Sep 10 '24 08:09 github-actions[bot]

why is it related with Spec Decode?

No available block found in 60 second is just a warning. The root cause is the vllm engine is stuck somewhere. You can try to follow docs.vllm.ai/en/latest/getting_started/debugging.html to debug where it is stuck.

Sep 10 '24 08:09 youkaichao

why is it related with Spec Decode?

No available block found in 60 second is just a warning. The root cause is the vllm engine is stuck somewhere. You can try to follow docs.vllm.ai/en/latest/getting_started/debugging.html to debug where it is stuck.

Hi, I have re-stated my problem and fix in detail above.

Sep 10 '24 08:09 xq25478

thanks for the clarification. can you give a reproducible example and environment?

cc @cadedaniel if you have bandwidth to investigate this spec decode related issue.

Sep 10 '24 17:09 youkaichao

Closing as stale

Apr 05 '25 11:04 hmellor

[BugFix] Spec Decode error:No available block found in 60 Second.

What Does This PR Do?