Daniel Koceja
Daniel Koceja
GCC 10 doesn't support bit_cast ([p0476r2](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0476r2.html)), but GCC11 does. You can check the implementation status of different std features for these compilers here: [GCC 10 implementation status](https://gcc.gnu.org/onlinedocs/gcc-10.5.0/libstdc++/manual/manual/status.html#status.iso.2020) [GCC 11 implementation...
Is there any update on this/ is someone looking into this?
[BUG] Qwen3 MoE with FSDP2 meets `torch.utils.checkpoint.CheckpointError` when `offload_policy=True`
I think there is a chance that setting use_reentrant=True in the fsdp worker could fix it, but I personally have just seen really bad performance with fsdp2 for large MOE...
https://download.pytorch.org/whl/cu128 has the aarch64 cuda wheels for torch (I don't think pypi has the non-cpu version). I think both SGLang and vLLM support gb200s so it should be possible? I...
Does anyone know if gb200 support is part of Verl timeline?