Jee Jee Li
Jee Jee Li
@DarkLight1337 @ywang96 I have added the LoRA test
@DarkLight1337 It looks like we need force merge
> Is this for testing only? The example scripts shouldn't use LoRA. I implemented this example by referring to https://github.com/vllm-project/vllm/pull/14119#issue-2890374044. It looks like that LoRA is necessary.
@DarkLight1337 @Isotr0py I have added support for all PHI4MM examples, but due to issues with my local network, I haven't actually tested the multi-image examples.
cc @Yard1 @WoosukKwon
@Yard1 How can I coordinate with @FurtherAI? I'm happy to work together to push this feature forward, but it seems like we have different approaches to kernel implementation. Which one...
@FurtherAI Let's cheers to wonderful similarities, I'm glad to see like-minded people to address these issues. Here is my response: - can you tell me about any limitations or assumptions?...
@FurtherAI I apologize for the late reply due to the weekend. I provide some code in `temp_test.py`
> Currently this imp still has two kernel dealing with shrink and expand separately. I wonder whether we could merge them into one? So that triton could do the pipeline...
> @jeejeelee So it looks like our kernels accomplish two partly different goals. > > Yours can function as a drop in replacement for the current Punica kernels. I have...