zjing14

Results 14 issues of zjing14

Enabled bf16 atomic_add on MI300

Summary: ``` buck2 run @//mode/opt-amd-gpu -c fbcode.rocm_arch=mi300 --modifier ovr_config//third-party/rocm/constraints:6.0.1 //deeplearning/fbgemm/fbgemm_gpu/experimental/gen_ai/bench:quantize_bench -- --enable_amd_env_vars --kernels=ck_rowwise --N 3584 --M 8192 --K 9728 --use_rotating_buffer_bench ck_rowwise sim: 13.812. ck_rowwise ms: 0.558. ck_rowwise TFLOPS: 1022.833. ck_rowwise...

fb-exported
cla signed