侯奇
侯奇
It can be solved, but not a very easy one. have to * put the shape fully into device side * run no CUDA runtime API and use kernels for...
sees there is a bug, i will release the latest later.
thanks for your comment. FLUX is not tested with CUDA 12.9. we will fix it later.
strange that you run build with --arch 89 but got error with complain about `bytedance::flux::GemmV3Meta::check_type() const`. GemmV3Meta is for hopper only. did you build the repo once with --arch 90,...
maybe it's because ada fp8 implementation has no non-fast accumulation.