侯奇

Results 45 comments of 侯奇

> Thanks, > > I add 86 arguments to `/flux/src/cuda/op_registery.cu` line 36 like this: > > ```cuda-c++ > void > init_arch_tag() { > int major, minor; > cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, 0);...

> Does this work? I tried modifying these parts, but it still reports errors after the changes. Could provide more specific guidance on how to modify it? I run it...

fixed by https://github.com/bytedance/flux/pull/123. close this

> > [@zkyue](https://github.com/zkyue) Yes, fp8 support is on the way. And we will release it in future. > > Thank you for your reply. I am now looking to apply...

it's not tested on RoCE NIC. maybe this is a problem with NVSHMEM. can you run nvshmem examples with nvshmrun on RoCE NIC?

check the READMe.md and run the install_deps.sh. there is a CUTLASS patch which helps. ``` git clone --recursive https://github.com/bytedance/flux.git && cd flux # Install dependencies bash ./install_deps.sh # For Ampere(sm80)...

> The problem seems to be the improper gcc version. gcc10 and gcc12 both work but gcc11.4 fails. If you are using gcc11, please comment out the `using cute::operator BTW,...

can we release the version compiled without nvshmem and let users compile with nvshmem themseles?

seems SOL time is calculated without divide the TP_SIZE, so is 8x timer larger. should be fixed later.