xu-yfei
Results
2
issues of
xu-yfei
## Motivation This pr is for dp mla #5001 About dp mla: On an 8*H20(96GB), weight mem usage=87.19 GB when `--dp-size 4 --enable-dp-attention`, not enough memory left. This optimization is...
## Motivation Base on dp_mla_kernel PR #5000 **Description**: On an 8*H20(96GB), weight mem usage=87.19 GB when `--dp-size 4 --enable-dp-attention`, not enough memory left. This optimization is similar to data parallelism...