xla
xla copied to clipboard
No module named "torchgen"
❓ Questions and Help
Hi : With the user guide https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md#build-from-source I have built pytorch successfully, but when i build xla, error occurs, "No module named 'torchgen'", what should i do to solve this problem?
INFO: Analyzed target //:_XLAC.so (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /data/harward/lmcode/aiframework/Pytorch/xla/torch_xla/csrc/BUILD:6:8: Executing genrule //torch_xla/csrc:gen_lazy_tensor failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped)
Traceback (most recent call last):
File "/home/harward/.cache/bazel/_bazel_harward/e80ff508ed67f80b0b8a833af2da283f/execroot/__main__/bazel-out/k8-opt/bin/codegen/lazy_tensor_generator.runfiles/__main__/codegen/lazy_tensor_generator.py", line 6, in <module>
from torchgen.api.lazy import LazyIrSchema
ModuleNotFoundError: No module named 'torchgen'
Target //:_XLAC.so failed to build
@ManfeiBai can you help?
My guess is that the installation of the pytorch on your machine has some issue. In my dev machine, after building pytorch, I am able to import torchgen
root@t1v-n-8e893749-w-0:/ansible# python
Python 3.8.18 (default, Nov 21 2023, 19:23:22)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchgen
>>>
@JackCaoG Actually i can also import torchgen like what you do , but it's strange that building xla still cause import torchgen error. By the way, i am using Anaconda, so i guess maybe there are some limits building xla with Anaconda?
(xla) harward@njpc130:/data/harward/lmcode/aiframework/Pytorch/xla$ python3 Python 3.8.10 (default, Jun 4 2021, 15:09:15) [GCC 7.5.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torchgen >>>
Yea I have been just use our dev docker image in https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md#building-manually where we have all the necessary build tool setup.
@ManfeiBai can you help?
My guess is that the installation of the pytorch on your machine has some issue. In my dev machine, after building pytorch, I am able to import torchgen
root@t1v-n-8e893749-w-0:/ansible# python Python 3.8.18 (default, Nov 21 2023, 19:23:22) [GCC 10.2.1 20210110] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torchgen >>>
thanks, will start to repro
@dinghaodhd, I failed to repro this on my TPU, which env are you building ptxla on? is that CPU/GPU/TPU? and would you mind share your commands to build too?
and my local command used to built torch_xla is:
# please install the latest Minicond3 on your side first before the follow commands
source ~/.bashrc
conda create --name torch310 python=3.10
conda activate torch310
export _GLIBCXX_USE_CXX11_ABI=1
conda install cmake ninja
conda uninstall -c conda-forge gcc= gxx
sudo apt remove gcc g++
sudo apt-get install gcc-10 g++-10
sudo ln -s /usr/bin/gcc-10 /usr/local/bin/gcc
sudo ln -s /usr/bin/g++-10 /usr/local/bin/g++
source ~/.bashrc
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export CC=gcc
export CXX=g++
python setup.py develop
# please install bazel before the following commands
git clone https://github.com/pytorch/xla.git
cd xla
python setup.py develop
@ManfeiBai , Because I found the same issue, so I try your command.
Unfortunately:
The following error occurs when trying your command: – Configuring done (7.5s) CMake Error: CMake can not determine linker language for target: dnnl_cpu CMake Error: CMake can not determine linker language for target: dnnl_cpu_x64 CMake Error: CMake can not determine linker language for target: dnnl_graph_interface CMake Error: CMake can not determine linker language for target: dnnl_graph_backend_fake CMake Error: CMake can not determine linker language for target: dnnl_graph_backend_dnnl CMake Error: CMake can not determine linker language for target: dnnl_graph_utils CMake Warning at caffe2/CMakeLists.txt:813 (add_library): Cannot generate a safe runtime search path for target torch_cpu because files in some directories may conflict with libraries in implicit directories:
runtime library [libgomp.so.1] in /usr/lib/gcc/x86_64-linux-gnu/10 may be hidden by files in: /home/cad/anaconda3/lib
Some of these libraries may not be found correctly.
– Generating done (0.8s) CMake Generate step failed. Build files cannot be regenerated correctly.
My env is : os: Ubuntu22.04, device: CPU
I found no error building method:
- Exit from conda env, the build pytorch in system(Python 3.8 is installed.);
- Enter the conda env, then build xla, and xla will be built successfully.
- So I believe:
the bazel use the system's python3.8 but not conda's python3.10 when I am building the pytorch/xla in conda env.
Am I right? How can I use conda's python3.10 when I am building the pytorch/xla in conda env?
Should the root privilege be required when building pytorch and xla source code?
It seems like bazel still uses the system python binary even when the conda environment is activated. I was able to fix the issue by providing --action_env=PYTHON_BIN_PATH=/home/user/.conda/envs/xla_build/bin/python3
to bazel or export PYTHON_BIN_PATH=/home/user/.conda/envs/xla_build/bin/python3