XiongfeiWei

Results 18 issues of XiongfeiWei

### Description Hi. I am extending the Pallas paged attention kernel. The case is a MQA. When I run my kernel, I encountered the following error which suggests it is...

bug

This PR integrates the new ragged paged attention kernel with vLLM v1 on TPU. In particular, this PR - Update torch_xla pin to the latest - Update pallas.py in v1...

needs-rebase
ci/build
v1

Use the optimized block sizes after tuning the kernel.

v1

Reduce the size of block_table by getting rid of padding. Test plan: 1. $ VLLM_USE_V1=1 pytest -s -v vllm/tests/entrypoints/llm/test_accuracy.py::test_lm_eval_accuracy_v1_engine 2>&1 | tee out.txt 2. ``` VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.1-8B-Instruct --disable-log-requests...

v1

This PR enables gemma3-27b with TP>1 on multi-chips. Without the change, it fails with an error: ``` callstack: Traceback (most recent call last): File "/home/xiowei/vllm/vllm/v1/executor/multiproc_executor.py", line 465, in worker_busy_loop output...

tpu
v1