侯奇
侯奇
> Thanks, > > I add 86 arguments to `/flux/src/cuda/op_registery.cu` line 36 like this: > > ```cuda-c++ > void > init_arch_tag() { > int major, minor; > cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, 0);...
> Does this work? I tried modifying these parts, but it still reports errors after the changes. Could provide more specific guidance on how to modify it? I run it...
fixed by https://github.com/bytedance/flux/pull/123. close this
> > [@zkyue](https://github.com/zkyue) Yes, fp8 support is on the way. And we will release it in future. > > Thank you for your reply. I am now looking to apply...
it's not tested on RoCE NIC. maybe this is a problem with NVSHMEM. can you run nvshmem examples with nvshmrun on RoCE NIC?
check the READMe.md and run the install_deps.sh. there is a CUTLASS patch which helps. ``` git clone --recursive https://github.com/bytedance/flux.git && cd flux # Install dependencies bash ./install_deps.sh # For Ampere(sm80)...
@ZSL98 please help?
> The problem seems to be the improper gcc version. gcc10 and gcc12 both work but gcc11.4 fails. If you are using gcc11, please comment out the `using cute::operator BTW,...
can we release the version compiled without nvshmem and let users compile with nvshmem themseles?
seems SOL time is calculated without divide the TP_SIZE, so is 8x timer larger. should be fixed later.