Scott Ricketts

Results 22 comments of Scott Ricketts

Main remaining items: - Fix module-status test (currently hitting error during module registration) - Maybe: add additional module tests as listed by TODO items in the code (or we can...

@mgoin can we close this? The cubins can be prefetched by either of the following: 1. `flashinfer-cubin` wheel 2. `./python -m flashinfer --download-cubin` (1) is not in production yet, but...

Thanks for flagging. Will review and follow up with an ETA for looking at this.

Planning for @sunghyunp-nvdia to start looking at this, will aim to update by early next week.

We're working on prioritizing ideas from [[Performance]: Custom fused kernel tracking · Issue #25179 · vllm-project/vllm](https://github.com/vllm-project/vllm/issues/25179) and will add to the roadmap once we have a more concrete plan.

> Is there any plan for Cute DSL support on following kernels ? @rainj-me when choosing how to implement kernels in FlashInfer, we consider a number of factors (e.g. perf,...

> > > Is there any plan for Cute DSL support on following kernels ? > > > > > > [@rainj-me](https://github.com/rainj-me) when choosing how to implement kernels in FlashInfer,...

Thanks for flagging. Will review and follow up with an ETA for looking at this.

@sunghyunp-nvdia has done some initial investigation here. For the "compute-bound bench" (`M, N, K = 8192, 8192, 8192`), here's what we're measuring: - `cutlass`: 5184 TFLOPS - `cudnn`: 5956 TFLOPS...