侯奇
侯奇
@zheng-ningxin sm90 fuse_reduction broken?
> cmd: $ CUDA_LAUNCH_BLOCKING=1 /vllm-workspace/flux/scripts/launch.sh /vllm-workspace/flux/test/test_ag_kernel_pyshmem.py 4096 72 512 --dtype=float16 --iters=10  Sorry , I don't know FLUX has such a file called `test/test_ag_kernel_pyshmem.py` Besides, 72 for N is too...
please provide more information, such as your hardware/software information. but here N is too small and flux may not perform well. we fix it later.
most of them are not tuned. you can tune it yourself. for GEMM+RS and AG+GEMM, use tools here: https://github.com/bytedance/flux/tree/main/tools for MOE related: no tools yet. A PR is welcome.
> > We recently open source the moe part, and related tuning script can be find [here](https://github.com/bytedance/flux/pull/59). You can use that for reference. > > I pulled the latest code...
> > also make sure you compile with the right image. we use NVCC 12.4 + gcc 12. > > Do you use basic images? For example, those like NGC?...
> > also make sure you compile with the right image. we use NVCC 12.4 + gcc 12. > > Do you use basic images? For example, those like NGC?...
maybe you can clean the build directory and recompile and run again?
we do no test for ray. seems that ray will limit each process to only 1 GPU?
sorry this is to be added.