GB200 support
I was wondering if there are plans to support gb200s. Right now all of the packages and docker materials are for linux x86, but the GB200s are aarch64. I have tried to set this up myself, but I am running into much trouble. Would it be possible to get a working setup script for this?
I doubt Pytorch already supports aarch64 wheel?
https://download.pytorch.org/whl/cu128 has the aarch64 cuda wheels for torch (I don't think pypi has the non-cpu version). I think both SGLang and vLLM support gb200s so it should be possible? I just have been running into many issues on my own installation, it would be really helpful to have a script or something to use.
I spent lots of efforts on this issue as well. vLLM releases the latest v.10.2 that supports ARM, but without flash-attention, Nvidia latest NGC containers support flash-attention, but I can't make it compatible with vLLM. Wish someone can successfully release a working wheel.
Does anyone know if gb200 support is part of Verl timeline?
I am also interested in the timeline.
I am WIP with supporting GB200 now