TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

[QUESTION] Does TP overlap support variable sequence length?

Open wplf opened this issue 1 year ago • 7 comments

Hi, thank you for great works. I'd like to ask if TP overlap support variable sequence length?

wplf avatar Nov 01 '24 09:11 wplf

TP overlap currently requires sequence parallelism and does not have any attention layout/format restrictions except that the sequence length has to be constant and evenly divisible by TP size.

Since you asked about the format, I want to clarify that we currently do not support comm+GEMM overlap in the attention mechanism. TP overlap is restricted to the te.Linear, te.LayerNormLinear and te.LayerNormMLP modules.

denera avatar Nov 05 '24 23:11 denera

Thank you very much. I'd like to ask another question. Is there any chance to bypass the constraint about the sequence length has to be constant and evenly divisible by TP size? I'd like to overlap TP/SP for thd format, in which sequence length is flexible.

wplf avatar Nov 06 '24 02:11 wplf

Unfortunately the current implementation does not support variable sequence lengths, so you would have to pad your sequences up to a static maximum. Theoretically there is no reason why it couldn't be done, but the custom communication kernels we use for TP overlap have far too many hard-coded assumptions about buffer and work chunk sizes to strip out easily in practice.

We do plan to support this in the near future, after we migrate the TP overlap functionality to the latest cuBlasMp v0.3.0 release that introduced support for collective GEMM with overlapped communication (these are NVSHMEM-based re-implementations of the same TP overlap algorithms in Transformer Engine).

denera avatar Nov 06 '24 04:11 denera

Thanks for your great works, again. Could you tell me the ETA of this feature?

wplf avatar Nov 06 '24 06:11 wplf

I hope to integrate cuBlasMp into TE by mid-December at the latest. There's a chance this might support variable sequence lengths out of the box, but otherwise it would have to wait until at least January if not later, depending on where this feature lands on our list of priorities.

denera avatar Nov 06 '24 06:11 denera

Hello, denera. Does this issue be supported for now?

Thank you for your time. Best regards.

wplf avatar Jan 14 '25 06:01 wplf

@denera Sorry to bother.

Does TP overlap support variable sequence length for now?

Thank you very much.

wplf avatar Feb 24 '25 09:02 wplf