Zhewen Li
Results
2
issues of
Zhewen Li
The recent change in PR #15734 adds a full_scales tensor to the call site in rocm_flash_attn.py. However, _attention.forward in attention/ops/triton_flash_attention.py still accepts only 12 positional arguments. This mismatch causes: ```...
needs-rebase
## Purpose AMD CI is using mi325, but the MoE config is not added: ``` WARNING [fused_moe.py:886] Using default MoE config. Performance might be sub-optimal! Config file not found at...
ready
llama