siriluo
siriluo
I spent lots of efforts on this issue as well. vLLM releases the latest v.10.2 that supports ARM, but without flash-attention, Nvidia latest NGC containers support flash-attention, but I can't...
> veRL-SGLang on GH200 aarch64 cluster, I got installation working and standalone SGLang works as well as veRL-vLLM, however it seems something is not working well with veRL-SGLang as it's...
It seems that the availability of prebuilt flash-atten wheels on aarch64 depends on fixing those GPU backend issues (sm90/100 )first, is this ture for all flash-atten version, or only for...
> I published some at https://huggingface.co/datasets/malaysia-ai/Flash-Attention3-wheel, > > ## Flash-Attention3-wheel > Flash Attention 3 wheels on commit [0e60e39473e8df549a20fb5353760f7a65b30e2d](https://github.com/Dao-AILab/flash-attention/commit/0e60e39473e8df549a20fb5353760f7a65b30e2d). > > ### Build using H100 > For PyTorch 2.6.0 12.6, 2.7.0...