BladeDISC
BladeDISC copied to clipboard
version unmatched when i build from source code
After I git clone this project, i tried to compile from source code. when i ran into bash ./scripts/build_pytorch_blade.sh i got this error while my pytorch version is
torch.version '2.1.0.dev20230316+cpu' and torchvision.version is '0.16.0.dev20230316+cpu'
which are the latest version right from the pip. Can you tell me a way not to downgrade my pytorch or torchvision and install BladeDisc Successfully by change build_pytorch_blade.sh or requirements.txt a little bit? Thank you very much!
@wyzero
@ehuaa The issue was raised because the script wants to install torch==1.7.1+cu110 as it depends; this is configured via TORCH_BLADE_CI_BUILD_TORCH_VERSION, see https://github.com/alibaba/BladeDISC/blob/main/pytorch_blade/scripts/build_pytorch_blade.sh#L32.
BladeDISC already supports torch 2.0; You can skip the torch pip installation in the script build_pytorch_blade.sh.
@ehuaa The issue was raised because the script wants to install torch==1.7.1+cu110 as it depends; this is configured via TORCH_BLADE_CI_BUILD_TORCH_VERSION, see https://github.com/alibaba/BladeDISC/blob/main/pytorch_blade/scripts/build_pytorch_blade.sh#L32.
BladeDISC already supports torch 2.0; You can skip the torch pip installation in the script build_pytorch_blade.sh.
okay, thanks, i will try it later. And the docker image was not used during the installation of building from source code, i wonder if i missed some steps...
ERROR: @local_config_cuda//:enable_cuda :: Error loading option @local_config_cuda//:enable_cuda: no such package '@llvm-raw//utils/bazel': java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/8c712296fb75ff73db08f92444b35c438c01a405.tar.gz, https://github.com/llvm/llvm-project/archive/8c712296fb75ff73db08f92444b35c438c01a405.tar.gz] to /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/external/llvm-raw/temp14641944661529926033/8c712296fb75ff73db08f92444b35c438c01a405.tar.gz: Premature EOF
Traceback (most recent call last):
File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 151, in
@ehuaa BladeDISC workspace is built with bazel. Thus we use bazel to resolve a lot of project third-party dependencies.
The error might be caused since there is a downloading failure. Please check your network and retry.
**/tests/torchscript:since_1_14.graph.test FAILED in 0.8s /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/testlogs/tests/torchscript/since_1_14.graph.test/test.log
Executed 37 out of 37 tests: 36 tests pass and 1 fails locally.**
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
INFO: Build completed, 1 test FAILED, 12791 total actions
Traceback (most recent call last):
File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 151, in
After i fixed the network problem, i failed with one test above ,is this the reason of the traceback above? the info in since_1_14.graph.test/test.log is:
TEST 'MLIR torchscript :: since_1_14.graph' FAILED Script:
: 'RUN: at line 1'; /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/shape_analysis_tool --since 1.14.0 -f /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph | /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/llvm-project/llvm/FileCheck /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph
Exit Code: 1
Command Output (stderr):
terminate called after throwing an instance of 'torch::jit::ErrorReport'
what():
Couldn't find an operator for aten::var.correction(Tensor self, int[1]? dim, , int? correction, bool keepdim=False) -> Tensor. Do you have to update a set of hardcoded JIT ops?failed shape propagation in this context. The above operation:
%4 : Tensor = aten::amax(%p1, %3, %1)
The inputs are:
Float(, *, *, ) from %p1 : Float(, *, *, *) = prim::Param()
int[] from %3 : int[] = prim::ListConstruct(%2)
bool from %1 : bool = prim::Constantvalue=1
:
%cst_1: int = prim::Constantvalue=-1
%dims : int[] = prim::ListConstruct(%cst_1)
%1 : Tensor = aten::amax(%p1, %dims, %true)
~~~~ <--- HERE
return (%1)
/home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph:15:17: error: CHECK-LABEL: expected string not found in input
// CHECK-LABEL: graph
^
Input file:
-dump-input=help explains the following input dump.
Input was: <<<<<< 1: graph(%self : Float(*, *)): label:15'0 X~~~~~~~~~~~~~~~~~~~~~~ error: no match found label:15'1 ? possible intended match 2: %1 : int = prim::Constantvalue=0 label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3: %2 : int = prim::Constantvalue=1 label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4: %3 : int = prim::Constantvalue=32 label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5: %4 : int = prim::Constantvalue=512 label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6: %5 : int[] = prim::ListConstruct(%3, %4, %2) label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . . .
Looking foward to your reply, thanks! @tanyokwok
i fixed this problem by pull the latest pr you committed last week, thanks!