Thomas Parnell

Results 24 comments of Thomas Parnell

@DarkLight1337 CI issues are fixed now.

Is there any script that I can use to reproduce this issue? I've been looking into #5607 which appears related, but after some digging it, that bug seems to related...

Yeah, we've fixed this issue on our fork (as you found [here](https://github.com/IBM/vllm/pull/35)). Let me create a PR to contribute the fix upstream.

@randxie Interesting. I actually tried to test [these changes ](https://github.com/triton-lang/triton/pull/3544) that were merged into Triton main in[ our fork](https://github.com/IBM/vllm/pull/34), but it didn't help. I don't really see much else that...

There was a PR merged into Triton yesterday that tries to address this issue: https://github.com/triton-lang/triton/pull/4295. This fix is not yet included in `triton==3.0.0` which was released on PyPI yesterday.

So I've been digging into this a bit more and here is a summary of my findings: - Triton recently released v3.0.0, but it does **not** seem to include the...

Fix #6140 is ready from my pov, will try to get it approved and merged asap.

> I am fine having this in, can we log once if this happens so there's a hint of the performance degredation to users? I added a warning when we...

@njhill I saw you cleaned up this code recently. Did you happen to check the case with chunked prefill too? It looked like it was broken a couple of weeks...

Thanks @jeejeelee but that issue related to prefill performance. A quick look using torch profiler indicates that the majority of time is spent in decode kernel for both backends: using...