flash-linear-attention
flash-linear-attention copied to clipboard
[RFC] Implement model-specific 4d parallelism
Proposal
- We want to add
apply_tp&apply_cpfns for each models as their layer definitions can be varied.
Also see comments in https://github.com/fla-org/flame/issues/4