Stefan He

Results 32 comments of Stefan He

@tridao wonder when will FA3 blackwell version came out? Looking forward to it!

![Image](https://github.com/user-attachments/assets/08b77197-bb47-4a99-9a2e-9af7a6d09c5c) @EduardDurech Hi Eduard, thanks for your detailed profiling. I've done some profiling from our side by running QWen 7B GRPO using almost the same setup as the verl's recipe....

@EduardDurech Hi Eduard, tbh I don't have insightful update but just to share what i did: Some interesting finding: - In CUDA 12.6, sgl and vllm are on par -...

> * It is weird that gen is slower in veRL though than standalone, no? SGLang is roughly twice the throughput for me in normal inference [veRL-SGLang slower than expected...

> Open DP attention, MTP, cuda graph found that the performance dropped very much, analyzed and found that it was because the reception rate dropped very much. This caused the...

@pengcuo Currently the FA3 API only support headdim

> me too... > > # Environment > > sglang image == 0.4.6.post2.cu124 > model == Qwen3/Qwen3-235B-A22B > FA3 attention backend > > > # Result > > sglang Mean...

Hi @tridao, really appreciate your work! I'm curious about what "FA3 Ampere" refers to. As I understand it, most of FA3's improvements come from Hopper GPU features. So how does...

@kumare3 Thanks for reply. Regarding the `cache_version` solution, it would work for adhoc/exprimental workflow code (tho it requires some level of understanding of how caching works in Flyte, I know...