'TEDotProductAttention' object has no attribute 'tp_group_initialized'

Open 1049451037 opened this issue 1 year ago • 2 comments

Not working after updating to the main branch of TE in Megatron-LM.

Jun 17 '24 12:06 1049451037

Can you provide more information or a minimal reproducer?

This error suggests that the tensor-parallel group has not been properly configured. If you are using one of Megatron-LM's TE wrappers, the TP group must either be initialized prior to creating the layer (with megatron.core.parallel_state.initialize_model_parallel) or registered after creating the layer (with TransformerEngineBaseModule.set_tensor_parallel_group, see this Megatron-LM comment).

Jun 17 '24 20:06 timmoon10

Seems that TEDotProductAttention doesn't call the set_tensor_parallel_group function.

Jun 18 '24 02:06 1049451037