Yang Zheng
Yang Zheng
## ❓ General Questions If two identical new prompts are input at the same time, no preceding same prompt given so far and 0 cache hit. BlockSpaceManagerV1 will allocate the...
FILL IN THE PR DESCRIPTION HERE prompts = [ "Hello, my name is", "Hello, my name is", ] Identical slot_mapping for these two prompts during prefill and will update the...
FILL IN THE PR DESCRIPTION HERE Avoid creating intermediate tensor query_lens_tensor and compute query_start_loc on CPU and allow h2d async. **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN...
Speculative decoding accepted tokens should be the real #generated tokens
What is the usage of activate_this.py? And why --activate_venv is NOT set and it forces to do activate_venv?