TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

AttributeError: module 'transformer_engine' has no attribute 'pytorch'

Open Lzhang-hub opened this issue 1 year ago • 3 comments

I reinstall pip install flash-attn==2.6.1 in NGC pytorch docker image 24.06. When I run train job, I got follow error:

Traceback (most recent call last):
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/pretrain_gpt.py", line 8, in <module>
    from megatron.training import get_args
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/training/__init__.py", line 16, in <module>
    from .initialize  import initialize_megatron
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/training/initialize.py", line 18, in <module>
    from megatron.training.arguments import parse_args, validate_args
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/training/arguments.py", line 13, in <module>
    from megatron.core.models.retro.utils import (
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/models/retro/__init__.py", line 12, in <module>
    from .decoder_spec import get_retro_decoder_block_spec
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/models/retro/decoder_spec.py", line 9, in <module>
    from megatron.core.models.gpt.gpt_layer_specs import (
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/models/gpt/__init__.py", line 1, in <module>
    from .gpt_model import GPTModel
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 17, in <module>
    from megatron.core.transformer.transformer_block import TransformerBlock
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/transformer/transformer_block.py", line 16, in <module>
    from megatron.core.transformer.custom_layers.transformer_engine import (
  File "/data1/nfs15/nfs/bigdata/zhanglei/ai-platform/hpc-test/multi-node-train/megatron-lm-train/Megatron-LM/20240411/Megatron-LM/megatron/core/transformer/custom_layers/transformer_engine.py", line 80, in <module>
    class TELinear(te.pytorch.Linear):
AttributeError: module 'transformer_engine' has no attribute 'pytorch'

Lzhang-hub avatar Jul 15 '24 03:07 Lzhang-hub

This looks like an import error, probably from Flash Attention. Our import logic has an unfortunate side effect of suppressing error messages (see https://github.com/NVIDIA/TransformerEngine/pull/862#pullrequestreview-2072546018), so can you try replacing import transformer_engine with import transformer_engine.pytorch?

timmoon10 avatar Jul 15 '24 18:07 timmoon10

I'm having this same error. Replacing with import transformer_engine.pytorch changes. Can you give me any hint on how to solve this?

Traceback (most recent call last):
  File "/NeMo-Aligner/examples/nlp/gpt/train_gpt_sft.py", line 19, in <module>
    from nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_chat_dataset import get_prompt_template_example
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/nemo/collections/nlp/__init__.py", line 15, in <module>
    from nemo.collections.nlp import data, losses, models, modules
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/nemo/collections/nlp/models/__init__.py", line 28, in <module>
    from nemo.collections.nlp.models.language_modeling import MegatronGPTPromptLearningModel
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/nemo/collections/nlp/models/language_modeling/__init__.py", line 16, in <module>
    from nemo.collections.nlp.models.language_modeling.megatron_gpt_prompt_learning_model import (
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/nemo/collections/nlp/models/language_modeling/megatron_gpt_prompt_learning_model.py", line 31, in <module>
    from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 41, in <module>
    from nemo.collections.nlp.models.language_modeling.megatron.falcon.falcon_spec import get_falcon_layer_spec
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_spec.py", line 19, in <module>
    from megatron.core.transformer.attention import SelfAttention, SelfAttentionSubmodules
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/megatron/core/transformer/attention.py", line 12, in <module>
    from megatron.core.transformer.custom_layers.transformer_engine import SplitAlongDim
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/megatron/core/transformer/custom_layers/transformer_engine.py", line 7, in <module>
    import transformer_engine.pytorch as te
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/transformer_engine/pytorch/__init__.py", line 34, in <module>
    _load_library()
  File "/NeMo-Aligner/venv/lib/python3.10/site-packages/transformer_engine/pytorch/__init__.py", line 25, in _load_library
    so_path = next(so_dir.glob(f"transformer_engine_torch.*.{extension}"))
StopIteration

arelkeselbri avatar Aug 21 '24 12:08 arelkeselbri

same error

pizts avatar Sep 03 '24 12:09 pizts

Can you check if TE has built the required shared libraries? In particular, /NeMo-Aligner/venv/lib/python3.10/site-packages/transformer_engine should contain libtransformer_engine.so and something that looks like transformer_engine_torch.cpython-310-x86_64-linux-gnu.so.

If your TE install has libtransformer_engine.so but not transformer_engine_torch.*.so, then that means TE did not detect PyTorch during the build process. You can try forcing TE to build with PyTorch support by settting the NVTE_FRAMEWORK environment variable:

NVTE_FRAMEWORK=pytorch pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

See the TE install instructions.

timmoon10 avatar Oct 10 '24 23:10 timmoon10

please help to solve this:

Traceback (most recent call last): File "/home/iitbbsr/Documents/netram/NvidiaCosmos4/cosmos-predict1/cosmos_predict1/diffusion/inference/video2world.py", line 20, in from megatron.core import parallel_state File "/home/iitbbsr/Documents/netram/NvidiaCosmos4/Megatron-LM/megatron/core/init.py", line 5, in from megatron.core.distributed import DistributedDataParallel File "/home/iitbbsr/Documents/netram/NvidiaCosmos4/Megatron-LM/megatron/core/distributed/init.py", line 8, in from .torch_fully_sharded_data_parallel import TorchFullyShardedDataParallel File "/home/iitbbsr/Documents/netram/NvidiaCosmos4/Megatron-LM/megatron/core/distributed/torch_fully_sharded_data_parallel.py", line 16, in from ..models.common.embeddings.language_model_embedding import LanguageModelEmbedding File "/home/iitbbsr/Documents/netram/NvidiaCosmos4/Megatron-LM/megatron/core/models/common/embeddings/init.py", line 3, in from .rope_utils import apply_rotary_pos_emb File "/home/iitbbsr/Documents/netram/NvidiaCosmos4/Megatron-LM/megatron/core/models/common/embeddings/rope_utils.py", line 32, in from megatron.core.extensions.transformer_engine import fused_apply_rotary_pos_emb_thd File "/home/iitbbsr/Documents/netram/NvidiaCosmos4/Megatron-LM/megatron/core/extensions/transformer_engine.py", line 94, in class TELinear(te.pytorch.Linear): AttributeError: module 'transformer_engine' has no attribute 'pytorch'

sibani-pngrh avatar Jul 02 '25 11:07 sibani-pngrh

same

eliohead avatar Aug 11 '25 15:08 eliohead

The same issue!

RaiaN avatar Aug 12 '25 13:08 RaiaN

same

Roger-Lv avatar Aug 25 '25 11:08 Roger-Lv

+1 the same issue

OutofAi avatar Sep 08 '25 00:09 OutofAi

same

meettyj avatar Sep 11 '25 23:09 meettyj

same error

xuyoji avatar Sep 14 '25 14:09 xuyoji

same error,find it strange

nanfangxiansheng avatar Sep 19 '25 04:09 nanfangxiansheng

Try replace the flash-attn-3 / flash_attn_3 with flash-attn / flash_attn in transformer_engine/pytorch/attention/dot_product_attention/backends.py ?

tkob-vh avatar Sep 22 '25 09:09 tkob-vh

same error

chenhongjin811 avatar Sep 30 '25 13:09 chenhongjin811

This error message is hard for us to debug. As a convenience, the root transformer_engine package attempts to import the extensions for both PyTorch and JAX. However, it's unlikely that users have both available, so we suppress import errors. To get a better error message, please try import transformer_engine.pytorch or import transformer_engine.jax.

timmoon10 avatar Oct 09 '25 02:10 timmoon10