flux [QUESTION] Are there targeted optimizations for the ada architecture?

Hi，I noticed that you've been running benchmarks on the L20. May I ask if there are targeted optimizations for the ada architecture?

Mar 06 '25 02:03 dz1iang

most of them are not tuned. you can tune it yourself.

for GEMM+RS and AG+GEMM, use tools here: https://github.com/bytedance/flux/tree/main/tools

for MOE related: no tools yet. A PR is welcome.

Mar 06 '25 23:03 houqi

most of them are not tuned. you can tune it yourself.

for GEMM+RS and AG+GEMM, use tools here: main/tools

for MOE related: no tools yet. A PR is welcome.

thx, i will try.

Mar 07 '25 06:03 dz1iang

We recently open source the moe part, and related tuning script can be find here. You can use that for reference.

Mar 12 '25 01:03 wenlei-bao

We recently open source the moe part, and related tuning script can be find here. You can use that for reference.

I pulled the latest code immediately. When I compiled it on ada, the following error occurred. Could you please tell me what suggestions there are for fixing it? @wenlei-bao

<bytedance::flux::GemmStreamkModeEnum::SK> >, bytedance::flux::None, cute::tuple<cute::C<128>, cute::C<128>, cute::C<64> >, cute::C<bytedance::flux::GemmKindEnum::GemmStreamK>, cute::C<3>, cute::C<bytedance::flux::GemmRasterOrderEnum::AlongM>}; bytedance::flux::OpRegistry::OpCreator = std::function<std::unique_ptr<bytedance::flux::GemmOperatorBase>()>]’
/flux/build/src/ag_gemm/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu:585:135:   required from here
/flux/include/flux/flux.h:1047:34: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:34: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/include/flux/flux.h:1047:88: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:88: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
make[2]: *** [src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/build.make:92: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu.o] Error 1
make[2]: Leaving directory '/flux/build'
make[1]: *** [CMakeFiles/Makefile2:565: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/all] Error 2
make[1]: Leaving directory '/flux/build'
make: *** [Makefile:136: all] Error 2
+ merge_compile_commands
+ cd /flux
+ command -v ninja
++ ls './build/temp.*/build.ninja'
ls: cannot access './build/temp.*/build.ninja': No such file or directory
+ ninja -f -t compdb
ninja: error: loading '-t': No such file or directory

Mar 12 '25 06:03 dz1iang

We recently open source the moe part, and related tuning script can be find here. You can use that for reference.

I pulled the latest code immediately. When I compiled it on ada, the following error occurred. Could you please tell me what suggestions there are for fixing it? @wenlei-bao

<bytedance::flux::GemmStreamkModeEnum::SK> >, bytedance::flux::None, cute::tuple<cute::C<128>, cute::C<128>, cute::C<64> >, cute::C<bytedance::flux::GemmKindEnum::GemmStreamK>, cute::C<3>, cute::C<bytedance::flux::GemmRasterOrderEnum::AlongM>}; bytedance::flux::OpRegistry::OpCreator = std::function<std::unique_ptr<bytedance::flux::GemmOperatorBase>()>]’
/flux/build/src/ag_gemm/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu:585:135:   required from here
/flux/include/flux/flux.h:1047:34: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:34: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/include/flux/flux.h:1047:88: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:88: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
make[2]: *** [src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/build.make:92: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu.o] Error 1
make[2]: Leaving directory '/flux/build'
make[1]: *** [CMakeFiles/Makefile2:565: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/all] Error 2
make[1]: Leaving directory '/flux/build'
make: *** [Makefile:136: all] Error 2
+ merge_compile_commands
+ cd /flux
+ command -v ninja
++ ls './build/temp.*/build.ninja'
ls: cannot access './build/temp.*/build.ninja': No such file or directory
+ ninja -f -t compdb
ninja: error: loading '-t': No such file or directory

try clean your workspace then follow the README.md and try recompiles it.

NOTE that you have to run this before build.sh https://github.com/bytedance/flux/blob/main/install_deps.sh

also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.

Mar 12 '25 07:03 houqi

also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.

Do you use basic images? For example, those like NGC? @houqi

Mar 13 '25 02:03 dz1iang

also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.

Do you use basic images? For example, those like NGC? @houqi

maybe something is wrong. here is an issue related to this: https://github.com/bytedance/flux/issues/60

@ZSL98 is working on it.

Mar 13 '25 03:03 houqi

also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.

Do you use basic images? For example, those like NGC? @houqi

fixed by https://github.com/bytedance/flux/pull/66. take a try

Mar 14 '25 23:03 houqi