xla icon indicating copy to clipboard operation
xla copied to clipboard

No module named "torchgen"

Open dinghaodhd opened this issue 1 year ago • 10 comments

❓ Questions and Help

Hi : With the user guide https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md#build-from-source I have built pytorch successfully, but when i build xla, error occurs, "No module named 'torchgen'", what should i do to solve this problem?

INFO: Analyzed target //:_XLAC.so (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /data/harward/lmcode/aiframework/Pytorch/xla/torch_xla/csrc/BUILD:6:8: Executing genrule //torch_xla/csrc:gen_lazy_tensor failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped)
Traceback (most recent call last):
  File "/home/harward/.cache/bazel/_bazel_harward/e80ff508ed67f80b0b8a833af2da283f/execroot/__main__/bazel-out/k8-opt/bin/codegen/lazy_tensor_generator.runfiles/__main__/codegen/lazy_tensor_generator.py", line 6, in <module>
    from torchgen.api.lazy import LazyIrSchema
ModuleNotFoundError: No module named 'torchgen'
Target //:_XLAC.so failed to build

dinghaodhd avatar Dec 13 '23 07:12 dinghaodhd

@ManfeiBai can you help?

My guess is that the installation of the pytorch on your machine has some issue. In my dev machine, after building pytorch, I am able to import torchgen

root@t1v-n-8e893749-w-0:/ansible# python
Python 3.8.18 (default, Nov 21 2023, 19:23:22) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchgen
>>>

JackCaoG avatar Dec 13 '23 18:12 JackCaoG

@JackCaoG Actually i can also import torchgen like what you do , but it's strange that building xla still cause import torchgen error. By the way, i am using Anaconda, so i guess maybe there are some limits building xla with Anaconda?

(xla) harward@njpc130:/data/harward/lmcode/aiframework/Pytorch/xla$ python3 Python 3.8.10 (default, Jun 4 2021, 15:09:15) [GCC 7.5.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torchgen >>>

dinghaodhd avatar Dec 14 '23 02:12 dinghaodhd

Yea I have been just use our dev docker image in https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md#building-manually where we have all the necessary build tool setup.

JackCaoG avatar Dec 14 '23 18:12 JackCaoG

@ManfeiBai can you help?

My guess is that the installation of the pytorch on your machine has some issue. In my dev machine, after building pytorch, I am able to import torchgen

root@t1v-n-8e893749-w-0:/ansible# python
Python 3.8.18 (default, Nov 21 2023, 19:23:22) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchgen
>>>

thanks, will start to repro

ManfeiBai avatar Dec 15 '23 22:12 ManfeiBai

@dinghaodhd, I failed to repro this on my TPU, which env are you building ptxla on? is that CPU/GPU/TPU? and would you mind share your commands to build too?

and my local command used to built torch_xla is:

# please install the latest Minicond3 on your side first before the follow commands
source ~/.bashrc
conda create --name torch310 python=3.10
conda activate torch310

export _GLIBCXX_USE_CXX11_ABI=1
conda install cmake ninja

conda uninstall -c conda-forge gcc= gxx
sudo apt remove gcc g++
sudo apt-get install gcc-10 g++-10
sudo ln -s /usr/bin/gcc-10 /usr/local/bin/gcc
sudo ln -s /usr/bin/g++-10 /usr/local/bin/g++
source ~/.bashrc

git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export CC=gcc
export CXX=g++
python setup.py develop

# please install bazel before the following commands

git clone https://github.com/pytorch/xla.git
cd xla
python setup.py develop

ManfeiBai avatar Dec 18 '23 19:12 ManfeiBai

@ManfeiBai , Because I found the same issue, so I try your command.

 Unfortunately:

The following error occurs when trying your command: – Configuring done (7.5s) CMake Error: CMake can not determine linker language for target: dnnl_cpu CMake Error: CMake can not determine linker language for target: dnnl_cpu_x64 CMake Error: CMake can not determine linker language for target: dnnl_graph_interface CMake Error: CMake can not determine linker language for target: dnnl_graph_backend_fake CMake Error: CMake can not determine linker language for target: dnnl_graph_backend_dnnl CMake Error: CMake can not determine linker language for target: dnnl_graph_utils CMake Warning at caffe2/CMakeLists.txt:813 (add_library): Cannot generate a safe runtime search path for target torch_cpu because files in some directories may conflict with libraries in implicit directories:

runtime library [libgomp.so.1] in /usr/lib/gcc/x86_64-linux-gnu/10 may be hidden by files in: /home/cad/anaconda3/lib

Some of these libraries may not be found correctly.

– Generating done (0.8s) CMake Generate step failed. Build files cannot be regenerated correctly.

wwtghx avatar Dec 21 '23 11:12 wwtghx

My env is : os: Ubuntu22.04, device: CPU

wwtghx avatar Dec 21 '23 11:12 wwtghx

I found no error building method:

  1. Exit from conda env, the build pytorch in system(Python 3.8 is installed.);
  2. Enter the conda env, then build xla, and xla will be built successfully.
  3. So I believe:
    the bazel use the system's python3.8 but not conda's python3.10 when I am building the pytorch/xla in conda env.

Am I right? How can I use conda's python3.10 when I am building the pytorch/xla in conda env?

wwtghx avatar Dec 21 '23 11:12 wwtghx

Should the root privilege be required when building pytorch and xla source code?

wwtghx avatar Dec 21 '23 11:12 wwtghx

It seems like bazel still uses the system python binary even when the conda environment is activated. I was able to fix the issue by providing --action_env=PYTHON_BIN_PATH=/home/user/.conda/envs/xla_build/bin/python3 to bazel or export PYTHON_BIN_PATH=/home/user/.conda/envs/xla_build/bin/python3

tdakhran avatar Feb 12 '24 10:02 tdakhran