XiongfeiWei issues

Results 18 issues of


                                            XiongfeiWei

INTERNAL: Mosaic failed to compile TPU kernel: unsupported shape cast

### Description Hi. I am extending the Pallas paged attention kernel. The case is a MQA. When I run my kernel, I encountered the following error which suggests it is...

bug

[DO NOT REVIEW YET] Extend paged attention

[DO NOT REVIEW] Experiment the pl.debug_print in Pallas.

Integrate the new ragged paged attention kernel with vLLM v1 on TPU

This PR integrates the new ragged paged attention kernel with vLLM v1 on TPU. In particular, this PR - Update torch_xla pin to the latest - Update pallas.py in v1...

needs-rebase

ci/build

[DO NOT MERGE YET] Use the optimized block sizes after tuning the kernel.

Use the optimized block sizes after tuning the kernel.

Reduce the size of block_table by getting rid of padding.

Reduce the size of block_table by getting rid of padding. Test plan: 1. $ VLLM_USE_V1=1 pytest -s -v vllm/tests/entrypoints/llm/test_accuracy.py::test_lm_eval_accuracy_v1_engine 2>&1 | tee out.txt 2. ``` VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.1-8B-Instruct --disable-log-requests...

[TPU] Enable gemma3-27b with TP>1 on multi-chips.

This PR enables gemma3-27b with TP>1 on multi-chips. Without the change, it fails with an error: ``` callstack: Traceback (most recent call last): File "/home/xiowei/vllm/vllm/v1/executor/multiproc_executor.py", line 465, in worker_busy_loop output...

tpu