Jee Jee Li comments

Results 206 comments of


                                            Jee Jee Li

[Misc] Qwen2.5 VL support LoRA

@DarkLight1337 @ywang96 I have added the LoRA test

[Misc] Qwen2.5 VL support LoRA

@DarkLight1337 It looks like we need force merge

[Misc] Add Phi4-MM example

> Is this for testing only? The example scripts shouldn't use LoRA. I implemented this example by referring to https://github.com/vllm-project/vllm/pull/14119#issue-2890374044. It looks like that LoRA is necessary.

[Misc] Add Phi4-MM example

@DarkLight1337 @Isotr0py I have added support for all PHI4MM examples, but due to issues with my local network, I haven't actually tested the multi-image examples.

[Kernel][RFC] Refactor the punica kernel based on Triton

cc @Yard1 @WoosukKwon

[Kernel][RFC] Refactor the punica kernel based on Triton

@Yard1 How can I coordinate with @FurtherAI? I'm happy to work together to push this feature forward, but it seems like we have different approaches to kernel implementation. Which one...

[Kernel][RFC] Refactor the punica kernel based on Triton

@FurtherAI Let's cheers to wonderful similarities, I'm glad to see like-minded people to address these issues. Here is my response: - can you tell me about any limitations or assumptions?...

[Kernel][RFC] Refactor the punica kernel based on Triton

@FurtherAI I apologize for the late reply due to the weekend. I provide some code in `temp_test.py`

[Kernel][RFC] Refactor the punica kernel based on Triton

> Currently this imp still has two kernel dealing with shrink and expand separately. I wonder whether we could merge them into one? So that triton could do the pipeline...

[Kernel][RFC] Refactor the punica kernel based on Triton

> @jeejeelee So it looks like our kernels accomplish two partly different goals. > > Yours can function as a drop in replacement for the current Punica kernels. I have...