Stefan He
Stefan He
@tridao wonder when will FA3 blackwell version came out? Looking forward to it!
 @EduardDurech Hi Eduard, thanks for your detailed profiling. I've done some profiling from our side by running QWen 7B GRPO using almost the same setup as the verl's recipe....
@EduardDurech Hi Eduard, tbh I don't have insightful update but just to share what i did: Some interesting finding: - In CUDA 12.6, sgl and vllm are on par -...
> * It is weird that gen is slower in veRL though than standalone, no? SGLang is roughly twice the throughput for me in normal inference [veRL-SGLang slower than expected...
> Open DP attention, MTP, cuda graph found that the performance dropped very much, analyzed and found that it was because the reception rate dropped very much. This caused the...
@pengcuo Currently the FA3 API only support headdim
> me too... > > # Environment > > sglang image == 0.4.6.post2.cu124 > model == Qwen3/Qwen3-235B-A22B > FA3 attention backend > > > # Result > > sglang Mean...
Hi @tridao, really appreciate your work! I'm curious about what "FA3 Ampere" refers to. As I understand it, most of FA3's improvements come from Hopper GPU features. So how does...
@kumare3 Thanks for reply. Regarding the `cache_version` solution, it would work for adhoc/exprimental workflow code (tho it requires some level of understanding of how caching works in Flyte, I know...