TransformerEngine
TransformerEngine copied to clipboard
'TEDotProductAttention' object has no attribute 'tp_group_initialized'
Not working after updating to the main branch of TE in Megatron-LM.
Can you provide more information or a minimal reproducer?
This error suggests that the tensor-parallel group has not been properly configured. If you are using one of Megatron-LM's TE wrappers, the TP group must either be initialized prior to creating the layer (with megatron.core.parallel_state.initialize_model_parallel) or registered after creating the layer (with TransformerEngineBaseModule.set_tensor_parallel_group, see this Megatron-LM comment).
Seems that TEDotProductAttention doesn't call the set_tensor_parallel_group function.