XiongfeiWei comments

Results 61 comments of


                                            XiongfeiWei

[TPU] Enable gemma3-27b with TP>1 on multi-chips.

Somehow, I still couldn't see my TPU CI running (Is it because all the tests are run in sequence and a CI before the TPU CI gets stuck and blocks...

[TPU] Enable gemma3-27b with TP>1 on multi-chips.

The failing CI seems to have the symptom of timeout. I don't see why my PR would cause that.

[TPU] Enable gemma3-27b with TP>1 on multi-chips.

Thanks @mgoin . I also did some check on my a100 VM. For the 2 failing tests: - VLLM_USE_V1=1 pytest -s -vv tests/mq_llm_engine/test_error_handling.py::test_mp_crash_detection: it fails on the main branch (4c33d6732148fdaeb9780fa86fca1f87f2a93c19)...

INTERNAL: Mosaic failed to compile TPU kernel: unsupported shape cast

The problematic line is `o_ref[:, q_head_idx, :] = acc_scratch_ref[:].astype(o_ref.dtype)`. I found a way to work around the problem (the code is in https://github.com/jax-ml/jax/issues/24415). But I'm trying to figure out why...

INTERNAL: Mosaic failed to compile TPU kernel: unsupported shape cast

It seems the assignee is not set when I use the link https://github.com/google/jax/issues/new?assignees=apaszke in the error message to create the issue. So manually cc @apaszke

INTERNAL: Mosaic failed to compile TPU kernel: unsupported shape cast

Thanks Justin for the explanation!

Reduce the size of block_table by getting rid of padding.

> NUM_KV_PAGES_PER_BLOCK is no longer used in tpu_model_runner.py after this change. Is that intentional? Yea, NUM_KV_PAGES_PER_BLOCK is used for padding. Since we don't need to pad anymore, we no longer...

Reduce the size of block_table by getting rid of padding.

Thanks @mgoin for the review!

Introduce CUDA OpenXLA fallback.

> The CI error is a bit tricky to solve. > > **Problem:** I'm using some CUDA functions defined inside PyTorch, which requires linking _libc10_cuda.so_ to the test binaries. However,...

Introduce CUDA OpenXLA fallback.

For the problem 1 "Problem1: C++ test binaries need all references to be resolved", you mentioned the "Solution: Create a fallback implementation of the CUDA functions". Could you point to...