[QUESTION] Are there targeted optimizations for the ada architecture?
Hi,I noticed that you've been running benchmarks on the L20. May I ask if there are targeted optimizations for the ada architecture?
most of them are not tuned. you can tune it yourself.
for GEMM+RS and AG+GEMM, use tools here: https://github.com/bytedance/flux/tree/main/tools
for MOE related: no tools yet. A PR is welcome.
most of them are not tuned. you can tune it yourself.
for GEMM+RS and AG+GEMM, use tools here:
main/toolsfor MOE related: no tools yet. A PR is welcome.
thx, i will try.
We recently open source the moe part, and related tuning script can be find here. You can use that for reference.
We recently open source the moe part, and related tuning script can be find here. You can use that for reference.
I pulled the latest code immediately. When I compiled it on ada, the following error occurred. Could you please tell me what suggestions there are for fixing it? @wenlei-bao
<bytedance::flux::GemmStreamkModeEnum::SK> >, bytedance::flux::None, cute::tuple<cute::C<128>, cute::C<128>, cute::C<64> >, cute::C<bytedance::flux::GemmKindEnum::GemmStreamK>, cute::C<3>, cute::C<bytedance::flux::GemmRasterOrderEnum::AlongM>}; bytedance::flux::OpRegistry::OpCreator = std::function<std::unique_ptr<bytedance::flux::GemmOperatorBase>()>]’
/flux/build/src/ag_gemm/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu:585:135: required from here
/flux/include/flux/flux.h:1047:34: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
1047 | return bool(cute::get<I>(t) < cute::get<I>(u)) ||
| ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
237 | CUTE_BINARY_OP( <);
| ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:34: note: ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
1047 | return bool(cute::get<I>(t) < cute::get<I>(u)) ||
| ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/include/flux/flux.h:1047:88: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
1047 | return bool(cute::get<I>(t) < cute::get<I>(u)) ||
| ^
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
237 | CUTE_BINARY_OP( <);
| ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:88: note: ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
1047 | return bool(cute::get<I>(t) < cute::get<I>(u)) ||
| ^
make[2]: *** [src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/build.make:92: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu.o] Error 1
make[2]: Leaving directory '/flux/build'
make[1]: *** [CMakeFiles/Makefile2:565: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/all] Error 2
make[1]: Leaving directory '/flux/build'
make: *** [Makefile:136: all] Error 2
+ merge_compile_commands
+ cd /flux
+ command -v ninja
++ ls './build/temp.*/build.ninja'
ls: cannot access './build/temp.*/build.ninja': No such file or directory
+ ninja -f -t compdb
ninja: error: loading '-t': No such file or directory
We recently open source the moe part, and related tuning script can be find here. You can use that for reference.
I pulled the latest code immediately. When I compiled it on ada, the following error occurred. Could you please tell me what suggestions there are for fixing it? @wenlei-bao
<bytedance::flux::GemmStreamkModeEnum::SK> >, bytedance::flux::None, cute::tuple<cute::C<128>, cute::C<128>, cute::C<64> >, cute::C<bytedance::flux::GemmKindEnum::GemmStreamK>, cute::C<3>, cute::C<bytedance::flux::GemmRasterOrderEnum::AlongM>}; bytedance::flux::OpRegistry::OpCreator = std::function<std::unique_ptr<bytedance::flux::GemmOperatorBase>()>]’ /flux/build/src/ag_gemm/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu:585:135: required from here /flux/include/flux/flux.h:1047:34: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’) 1047 | return bool(cute::get<I>(t) < cute::get<I>(u)) || | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ /flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’ 237 | CUTE_BINARY_OP( <); | ^~~~~~~~ /flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: template argument deduction/substitution failed: /flux/include/flux/flux.h:1047:34: note: ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’ 1047 | return bool(cute::get<I>(t) < cute::get<I>(u)) || | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ /flux/include/flux/flux.h:1047:88: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’) 1047 | return bool(cute::get<I>(t) < cute::get<I>(u)) || | ^ /flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’ 237 | CUTE_BINARY_OP( <); | ^~~~~~~~ /flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: template argument deduction/substitution failed: /flux/include/flux/flux.h:1047:88: note: ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’ 1047 | return bool(cute::get<I>(t) < cute::get<I>(u)) || | ^ make[2]: *** [src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/build.make:92: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu.o] Error 1 make[2]: Leaving directory '/flux/build' make[1]: *** [CMakeFiles/Makefile2:565: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/all] Error 2 make[1]: Leaving directory '/flux/build' make: *** [Makefile:136: all] Error 2 + merge_compile_commands + cd /flux + command -v ninja ++ ls './build/temp.*/build.ninja' ls: cannot access './build/temp.*/build.ninja': No such file or directory + ninja -f -t compdb ninja: error: loading '-t': No such file or directory
try clean your workspace then follow the README.md and try recompiles it.
NOTE that you have to run this before build.sh https://github.com/bytedance/flux/blob/main/install_deps.sh
also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.
also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.
Do you use basic images? For example, those like NGC? @houqi
also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.
Do you use basic images? For example, those like NGC? @houqi
maybe something is wrong. here is an issue related to this: https://github.com/bytedance/flux/issues/60
@ZSL98 is working on it.
also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.
Do you use basic images? For example, those like NGC? @houqi
fixed by https://github.com/bytedance/flux/pull/66. take a try