Zhewen Li

Results 2 issues of Zhewen Li

The recent change in PR #15734 adds a full_scales tensor to the call site in rocm_flash_attn.py. However, _attention.forward in attention/ops/triton_flash_attention.py still accepts only 12 positional arguments. This mismatch causes: ```...

needs-rebase

## Purpose AMD CI is using mi325, but the MoE config is not added: ``` WARNING [fused_moe.py:886] Using default MoE config. Performance might be sub-optimal! Config file not found at...

ready
llama