TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

Cannot import and use transformer_engine after successful installation with No module named 'transformer_engine_extensions'

Open sam-h-bean opened this issue 1 year ago • 4 comments

I have installed transformer_engine for use with Accelerate and Ray. I have the following requirements which work totally fine for all sorts of distributed training

torch==2.2.1
transformers==4.39.3
accelerate==0.29.2
deepspeed==0.14.0
datasets==2.18.0
sentencepiece==0.2.0
transformer_engine @ git+https://github.com/NVIDIA/TransformerEngine.git@stable

And for your information my Dockerfile looks like

FROM anyscale/ray:2.11.0-py39-cu121

COPY ./training-requirements.txt training-requirements.txt

RUN pip install -r training-requirements.txt --global-option=--debug

RUN python -c "import transformer_engine.pytorch as te"

But have also tried different versions of torch including latest.

I can import transformer_engine just fine when this image is deployed

>>> import transformer_engine
>>> 

However, when I try to import the pytorch module I get

>>> import transformer_engine.pytorch as te
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ray/anaconda3/lib/python3.9/site-packages/transformer_engine/pytorch/__init__.py", line 6, in <module>
    from .module import LayerNormLinear
  File "/home/ray/anaconda3/lib/python3.9/site-packages/transformer_engine/pytorch/module/__init__.py", line 6, in <module>
    from .layernorm_linear import LayerNormLinear
  File "/home/ray/anaconda3/lib/python3.9/site-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 13, in <module>
    from .. import cpp_extensions as tex
  File "/home/ray/anaconda3/lib/python3.9/site-packages/transformer_engine/pytorch/cpp_extensions/__init__.py", line 6, in <module>
    from transformer_engine_extensions import *
ModuleNotFoundError: No module named 'transformer_engine_extensions'

And when I try to import transformer_engine_extensions directly I get the same thing

>>> import transformer_engine_extensions
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'transformer_engine_extensions'

Wondering what is going on with the versions. Everything else works fine with torch it's just that transformer_engine seems to be installed incorrectly.

Here are some Nvidia specs as well

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
$ nvidia-smi
Sat May 18 16:47:56 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|

sam-h-bean avatar May 18 '24 23:05 sam-h-bean

Do you know if PyTorch is installed before you install the requirements.txt? TE supports DL frameworks other than PyTorch (e.g. JAX), so part of our build process involves checking what frameworks are installed. I suspect that TE doesn't find PyTorch, so it skips building the PyTorch extensions. To force it to build the PyTorch extensions, can you try setting NVTE_FRAMEWORK=pytorch in the environment?

timmoon10 avatar May 21 '24 23:05 timmoon10

@timmoon10 Thanks. If anyone wants to try importing transformer_engine_extensions after successfully building the PyTorch extensions. This is how you do it.

Python

import torch

import transformer_engine

import transformer_engine_extensions

I'm using cuda 12.5 and I'm using the nightly cuda 12.4 pytorch build on artix linux.

GUUser91 avatar Jun 01 '24 10:06 GUUser91

Not working for the current main branch of TE.

1049451037 avatar Jun 11 '24 09:06 1049451037

Solved. The transformer_engine_extensions has been renamed to transformer_engine_torch.

1049451037 avatar Jun 11 '24 10:06 1049451037