侯奇 comments

Results 45 comments of


                                            侯奇

[BUG] Fp8 Runtime Error: "bad any_cast"

do you compile with --arch 90?

[QUESTION]你好，我在nv H20上跑./launch.sh test/python/gemm_rs/test_gemm_rs.py 16384 4096 16384 --dtype=bfloat16 --iters=5报了这个错，H800上同样的镜像和命令没有问题？是H20有啥不一样吗？

@ZSL98 can you verify this?

[QUESTION]你好，我在nv H20上跑./launch.sh test/python/gemm_rs/test_gemm_rs.py 16384 4096 16384 --dtype=bfloat16 --iters=5报了这个错，H800上同样的镜像和命令没有问题？是H20有啥不一样吗？

there is no python stacktrace, guess there is a core dump. can you set `ulimit -c unlimited` and then runs again

[QUESTION] /src/moe_ag_scatter/test/test_moe_ag_scatter.cc only shows a V3 version

V2 and V3 are nearly the same. refer to https://github.com/bytedance/flux/blob/main/python/flux/cpp_mod.pyi#L649 here

[BUG] Failing to install byte-flux from pypi

seems no torch 2.4.0 wheel package compiled. @zheng-ningxin

[QUESTION] Is pretraining possible in Megatron using this method?

You have to use TE and TE requires a torch version. That torch version is not compatible with FLUX, right? You can compile FLUX from source with the torch version...

[QUESTION] Is pretraining possible in Megatron using this method?

> Additionally, based on what I’ve found, Flux works under the following conditions: **torch (2.4.0, 2.5.0, 2.6.0), python (3.10, 3.11), and cuda (12.4).** I’m currently building a Dockerfile and attempting...

[QUESTION] Is pretraining possible in Megatron using this method?

> [@houqi](https://github.com/houqi) Thank you for your response. I'm currently trying to run pretraining using the repository below. Would it be possible for you to share a Dockerfile that works with...

[QUESTION] Is it possible to use splitK kernel in AG mode to overlap comm and gemm?

Yes, it's possible in theory, but not implemented. I don't know that split-k is faster than stream-k. Can you provide some cases where split-k is faster than stream-k?

[QUESTION] Some value of args went wrong

thanks for your report. You can refer to here:https://github.com/bytedance/flux/blob/main/src/generator/gen_moe_gather_rs.cc#L90 and add some arguments. ```C++ static constexpr auto AllGemmHParams_FP16 = make_space_gemm_hparams( cute::make_tuple(make_gemm_v3_hparams(Shape{})), cute::make_tuple( make_gather_rs_hparams(cute::Int{}, cute::Int{}), make_gather_rs_hparams(cute::Int{}, cute::Int{}), make_gather_rs_hparams(cute::Int{}, cute::Int{}), make_gather_rs_hparams(cute::Int{}, cute::Int{}),...