TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

AttributeError: module 'transformer_engine' has no attribute 'pytorch'

Open Lzhang-hub opened this issue 1 year ago • 3 comments

I reinstall pip install flash-attn==2.6.1 in NGC pytorch docker image 24.06. When I run train job, I got follow error:

Traceback (most recent call last):
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/pretrain_gpt.py", line 8, in <module>
    from megatron.training import get_args
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/training/__init__.py", line 16, in <module>
    from .initialize  import initialize_megatron
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/training/initialize.py", line 18, in <module>
    from megatron.training.arguments import parse_args, validate_args
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/training/arguments.py", line 13, in <module>
    from megatron.core.models.retro.utils import (
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/models/retro/__init__.py", line 12, in <module>
    from .decoder_spec import get_retro_decoder_block_spec
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/models/retro/decoder_spec.py", line 9, in <module>
    from megatron.core.models.gpt.gpt_layer_specs import (
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/models/gpt/__init__.py", line 1, in <module>
    from .gpt_model import GPTModel
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 17, in <module>
    from megatron.core.transformer.transformer_block import TransformerBlock
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/transformer/transformer_block.py", line 16, in <module>
    from megatron.core.transformer.custom_layers.transformer_engine import (
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/transformer/custom_layers/transformer_engine.py", line 80, in <module>
    class TELinear(te.pytorch.Linear):
AttributeError: module 'transformer_engine' has no attribute 'pytorch'

Lzhang-hub avatar Jul 15 '24 03:07 Lzhang-hub