long-context-attention
long-context-attention copied to clipboard
Is there example of how to use the hybrid-sp in Megatron-LM?
Is there example of how to use the hybrid-sp in Megatron-LM?
Could you please refer to this PR in FlagScale, which is a framework built based on Megatron-LM.
https://github.com/FlagOpen/FlagScale/pull/156