Luka Govedič comments

Repositories
Issues
Comments

Results 93 comments of


                                            Luka Govedič

[Feature]: Improve startup time UX

Just saw this in a [comment](https://github.com/vllm-project/vllm/pull/17280#issuecomment-3013759134): > Furthermore the very first request after starting up vLLM takes 30-60 seconds. Feels like PTX being compiled or something. This only happens on...

[Bug]: FlashInfer attention backend on Hopper fails with llama4-scout and llama3 with fp8 kvcache

This might be the cause of the other issue I filed (llama4 on Blackwell) but this issue is llama4 AND llama3 on hopper

Compiler crash in the `CalledValuePropagationPass`

It's not letting me upload the file on GitHub so here's a [Google Drive link](https://drive.google.com/file/d/1szYpuBoQnZ0Xo4SuBbSaTG38k7KImUyS/view?usp=drive_link), let me know if that works