Yang Zheng

Results 5 issues of Yang Zheng

## ❓ General Questions If two identical new prompts are input at the same time, no preceding same prompt given so far and 0 cache hit. BlockSpaceManagerV1 will allocate the...

question

FILL IN THE PR DESCRIPTION HERE prompts = [ "Hello, my name is", "Hello, my name is", ] Identical slot_mapping for these two prompts during prefill and will update the...

FILL IN THE PR DESCRIPTION HERE Avoid creating intermediate tensor query_lens_tensor and compute query_start_loc on CPU and allow h2d async. **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN...

Speculative decoding accepted tokens should be the real #generated tokens

needs-rebase

What is the usage of activate_this.py? And why --activate_venv is NOT set and it forces to do activate_venv?