flux icon indicating copy to clipboard operation
flux copied to clipboard

[BUG]Failure to compile from `cuda::atomic_ref`

Open ad8e opened this issue 6 months ago • 1 comments

The error message will be:

/home/kevin/flux/include/flux/cuda/cuda_common_device.hpp(34): error: namespace "cuda" has no member "atomic_ref"
  using atomic_ref_sys = cuda::atomic_ref<T, cuda::thread_scope_system>;

The solution is https://github.com/bytedance/flux/issues/105#issuecomment-2814305710:

So I tried to change all #include <cuda/std/atomic> to #include <cuda/atomic> in the following files, and then there will be no more errors about atomic_ref.

FLUX/flux/include/flux/cuda/cuda_common_device.hpp FLUX/flux/src/moe_ag_scatter/cutlass_impls/ag_scatter_gemm_grouped_with_absmax.h FLUX/flux/src/moe_gather_rs/cutlass_impls/gather_rs_gemm_grouped_with_absmax.h

The CUDA_VERSION > 12080 check in cuda_common_device.hpp does not work on my system. I receive the error on Cuda 12.8 (which does not meet the check) and Cuda 12.9.1 (which does meet the check). #pragma message("cuda version" CUDA_VERSION) produces an error, so I believe CUDA_VERSION was not set yet at that location.

My installation commands were:

kevin@kevin-h100-0:/mnt/clusterstorage/workspace/kevin/flux$ git clone --recursive https://github.com/bytedance/flux.git && cd flux
kevin@kevin-h100-0:/mnt/clusterstorage/workspace/kevin/flux$ bash ./install_deps.sh
kevin@kevin-h100-0:/mnt/clusterstorage/workspace/kevin/flux$ ./build.sh --arch 90 --nvshmem

ad8e avatar Jun 30 '25 19:06 ad8e

thanks for your comment. FLUX is not tested with CUDA 12.9. we will fix it later.

houqi avatar Jul 09 '25 07:07 houqi