BladeDISC icon indicating copy to clipboard operation
BladeDISC copied to clipboard

version unmatched when i build from source code

Open ehuaa opened this issue 1 year ago • 7 comments

After I git clone this project, i tried to compile from source code. when i ran into bash ./scripts/build_pytorch_blade.sh i got this error while my pytorch version is

1679227207233

torch.version '2.1.0.dev20230316+cpu' and torchvision.version is '0.16.0.dev20230316+cpu'

which are the latest version right from the pip. Can you tell me a way not to downgrade my pytorch or torchvision and install BladeDisc Successfully by change build_pytorch_blade.sh or requirements.txt a little bit? Thank you very much!

ehuaa avatar Mar 19 '23 12:03 ehuaa

@wyzero

ehuaa avatar Mar 19 '23 12:03 ehuaa

@ehuaa The issue was raised because the script wants to install torch==1.7.1+cu110 as it depends; this is configured via TORCH_BLADE_CI_BUILD_TORCH_VERSION, see https://github.com/alibaba/BladeDISC/blob/main/pytorch_blade/scripts/build_pytorch_blade.sh#L32.

BladeDISC already supports torch 2.0; You can skip the torch pip installation in the script build_pytorch_blade.sh.

tanyokwok avatar Mar 19 '23 12:03 tanyokwok

@ehuaa The issue was raised because the script wants to install torch==1.7.1+cu110 as it depends; this is configured via TORCH_BLADE_CI_BUILD_TORCH_VERSION, see https://github.com/alibaba/BladeDISC/blob/main/pytorch_blade/scripts/build_pytorch_blade.sh#L32.

BladeDISC already supports torch 2.0; You can skip the torch pip installation in the script build_pytorch_blade.sh.

okay, thanks, i will try it later. And the docker image was not used during the installation of building from source code, i wonder if i missed some steps...

ehuaa avatar Mar 19 '23 13:03 ehuaa

ERROR: @local_config_cuda//:enable_cuda :: Error loading option @local_config_cuda//:enable_cuda: no such package '@llvm-raw//utils/bazel': java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/8c712296fb75ff73db08f92444b35c438c01a405.tar.gz, https://github.com/llvm/llvm-project/archive/8c712296fb75ff73db08f92444b35c438c01a405.tar.gz] to /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/external/llvm-raw/temp14641944661529926033/8c712296fb75ff73db08f92444b35c438c01a405.tar.gz: Premature EOF Traceback (most recent call last): File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 151, in setup( File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/init.py", line 108, in setup return distutils.core.setup(**attrs) File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/dist.py", line 1221, in run_command super().run_command(command) File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 107, in run self.cpp_run() File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 91, in cpp_run build.test() File "/home/banach/BladeDISC/pytorch_blade/bazel_build.py", line 283, in test subprocess.check_call(test_cmd, shell=True, env=env, executable="/bin/bash") File "/home/banach/anaconda3/envs/p310/lib/python3.10/subprocess.py", line 369, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'set -e; set -o pipefail; source .bazel_pyenv/bin/activate; bazel test --action_env PYTHON_BIN_PATH=/home/banach/anaconda3/envs/p310/bin/python3 --action_env BAZEL_LINKLIBS=-lstdc++ --action_env CC=/usr/bin/gcc --action_env CXX=/usr/bin/g++ --action_env DISC_FOREIGN_MAKE_JOBS=32 --copt=-DPYTORCH_VERSION_STRING="2.1.0.dev20230319+cu117" --copt=-DPYTORCH_MAJOR_VERSION=2 --copt=-DPYTORCH_MINOR_VERSION=1 --copt=-DTORCH_BLADE_CUDA_VERSION=11.7 --action_env TORCH_BLADE_TORCH_INSTALL_PATH=/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/torch --copt=-DPYBIND11_COMPILER_TYPE="_gcc" --copt=-DPYBIND11_STDLIB="_libstdcpp" --copt=-DPYBIND11_BUILD_ABI="_cxxabi1011" --config=torch_debug --config=torch_tensorrt --action_env TENSORRT_INSTALL_PATH=/usr/local/TensorRT/ --action_env NVCC=/usr/local/cuda/bin/nvcc --config=torch_enable_quantization --config=torch_cxx11abi_0 --config=torch_cuda //tests/mhlo/... //pytorch_blade:torch_blade_test_suite //tests/torch-disc-pdll/tests/... //tests/torchscript/...' returned non-zero exit status 2. I ran into this error after i comment the torch pip installation and the url above cannot be reached is there anything wrong with my installation steps? i didn't run the nvidia docker. @tanyokwok

ehuaa avatar Mar 22 '23 03:03 ehuaa

@ehuaa BladeDISC workspace is built with bazel. Thus we use bazel to resolve a lot of project third-party dependencies.

The error might be caused since there is a downloading failure. Please check your network and retry.

tanyokwok avatar Mar 22 '23 06:03 tanyokwok

**/tests/torchscript:since_1_14.graph.test FAILED in 0.8s /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/testlogs/tests/torchscript/since_1_14.graph.test/test.log

Executed 37 out of 37 tests: 36 tests pass and 1 fails locally.** There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are. INFO: Build completed, 1 test FAILED, 12791 total actions Traceback (most recent call last): File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 151, in setup( File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/init.py", line 108, in setup return distutils.core.setup(**attrs) File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/dist.py", line 1221, in run_command super().run_command(command) File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 107, in run self.cpp_run() File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 91, in cpp_run build.test() File "/home/banach/BladeDISC/pytorch_blade/bazel_build.py", line 283, in test subprocess.check_call(test_cmd, shell=True, env=env, executable="/bin/bash") File "/home/banach/anaconda3/envs/p310/lib/python3.10/subprocess.py", line 369, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'set -e; set -o pipefail; source .bazel_pyenv/bin/activate; bazel test --action_env PYTHON_BIN_PATH=/home/banach/anaconda3/envs/p310/bin/python3 --action_env BAZEL_LINKLIBS=-lstdc++ --action_env CC=/usr/bin/gcc --action_env CXX=/usr/bin/g++ --action_env DISC_FOREIGN_MAKE_JOBS=32 --copt=-DPYTORCH_VERSION_STRING="2.1.0.dev20230319+cu117" --copt=-DPYTORCH_MAJOR_VERSION=2 --copt=-DPYTORCH_MINOR_VERSION=1 --copt=-DTORCH_BLADE_CUDA_VERSION=11.7 --action_env TORCH_BLADE_TORCH_INSTALL_PATH=/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/torch --copt=-DPYBIND11_COMPILER_TYPE="_gcc" --copt=-DPYBIND11_STDLIB="_libstdcpp" --copt=-DPYBIND11_BUILD_ABI="_cxxabi1011" --config=torch_debug --config=torch_tensorrt --action_env TENSORRT_INSTALL_PATH=/usr/local/TensorRT/ --action_env NVCC=/usr/local/cuda/bin/nvcc --config=torch_enable_quantization --config=torch_cxx11abi_0 --config=torch_cuda //tests/mhlo/... //pytorch_blade:torch_blade_test_suite //tests/torch-disc-pdll/tests/... //tests/torchscript/...' returned non-zero exit status 3.

After i fixed the network problem, i failed with one test above ,is this the reason of the traceback above? the info in since_1_14.graph.test/test.log is:

TEST 'MLIR torchscript :: since_1_14.graph' FAILED Script:

: 'RUN: at line 1'; /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/shape_analysis_tool --since 1.14.0 -f /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph | /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/llvm-project/llvm/FileCheck /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph

Exit Code: 1

Command Output (stderr):

terminate called after throwing an instance of 'torch::jit::ErrorReport' what():
Couldn't find an operator for aten::var.correction(Tensor self, int[1]? dim, , int? correction, bool keepdim=False) -> Tensor. Do you have to update a set of hardcoded JIT ops?failed shape propagation in this context. The above operation: %4 : Tensor = aten::amax(%p1, %3, %1) The inputs are: Float(, *, *, ) from %p1 : Float(, *, *, *) = prim::Param() int[] from %3 : int[] = prim::ListConstruct(%2) bool from %1 : bool = prim::Constantvalue=1 : %cst_1: int = prim::Constantvalue=-1 %dims : int[] = prim::ListConstruct(%cst_1) %1 : Tensor = aten::amax(%p1, %dims, %true) ~~~~ <--- HERE return (%1)

/home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph:15:17: error: CHECK-LABEL: expected string not found in input // CHECK-LABEL: graph ^ :1:6: note: scanning from here graph(%self : Float(*, )): ^ :1:16: note: possible intended match here graph(%self : Float(, *)): ^

Input file: Check file: /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph

-dump-input=help explains the following input dump.

Input was: <<<<<< 1: graph(%self : Float(*, *)): label:15'0 X~~~~~~~~~~~~~~~~~~~~~~ error: no match found label:15'1 ? possible intended match 2: %1 : int = prim::Constantvalue=0 label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3: %2 : int = prim::Constantvalue=1 label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4: %3 : int = prim::Constantvalue=32 label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5: %4 : int = prim::Constantvalue=512 label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6: %5 : int[] = prim::ListConstruct(%3, %4, %2) label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . . .

Looking foward to your reply, thanks! @tanyokwok

ehuaa avatar Mar 22 '23 10:03 ehuaa

i fixed this problem by pull the latest pr you committed last week, thanks!

ehuaa avatar Mar 22 '23 13:03 ehuaa