xla
xla copied to clipboard
PyTorch/XLA `gtest` needs to be updated
I am running into the following error when building PyTorch/XLA CPP tests using python setup.py install
.
Error report from: xla/test/cpp/build/gtest/src/googletest-stamp/googletest-build-err.log
In file included from xla/test/cpp/build/gtest/src/googletest-src/googletest/src/gtest-all.cc:42:
xla/test/cpp/build/gtest/src/googletest-src/googletest/src/gtest-death-test.cc: In function 'bool testing::internal::StackGrowsDown()':
xla/test/cpp/build/gtest/src/googletest-src/googletest/src/gtest-death-test.cc:1301:24: error: 'dummy' may be used uninitialized [-Werror=maybe-uninitialized]
1301 | StackLowerThanAddress(&dummy, &result);
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
xla/test/cpp/build/gtest/src/googletest-src/googletest/src/gtest-death-test.cc:1290:13: note: by argument 1 of type 'const void*' to 'void testing::internal::StackLowerThanAddress(const void*, bool*)' declared here
1290 | static void StackLowerThanAddress(const void* ptr, bool* result) {
| ^~~~~~~~~~~~~~~~~~~~~
xla/test/cpp/build/gtest/src/googletest-src/googletest/src/gtest-death-test.cc:1299:7: note: 'dummy' declared here
1299 | int dummy;
| ^~~~~
cc1plus: all warnings being treated as errors
make[5]: *** [googletest/CMakeFiles/gtest.dir/build.make:76: googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o] Error 1
make[4]: *** [CMakeFiles/Makefile2:172: googletest/CMakeFiles/gtest.dir/all] Error 2
make[3]: *** [Makefile:146: all] Error 2
Please investigate.
Blocker:
- [ ] https://github.com/tensorflow/tensorflow/issues/56021
Turns out this problem has existed with the current pin (i.e. commit 6f5fd0d7199b9a19faa
in googletest version 1.10
) for quite some time (ref). The solutions is to upgrade to version 1.11.0
. Doing it now.
I tried a few pin from version 1.11.0
including e2239ee6043f73722e7aa812a459f54a28552929
and 4679637f1c9d5a0728bdc347a531737fad0b1ca3
. None of them gave me a successful build result.
Observations:
- The initial error that prompted this issue is indeed fixed in version
1.11
- I run into a new set of errors. A snippet of the errors I observe is listed below.
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:454:5: error: variable of non-literal type '::tensorflow::internal::CheckOpString' cannot be defined in a constexpr function
DCHECK_GE(width, 0) << "Unsupported width " << width;
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:472:31: note: expanded from macro 'DCHECK_GE'
#define DCHECK_GE(val1, val2) CHECK_GE(val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:459:30: note: expanded from macro 'CHECK_GE'
#define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:452:40: note: expanded from macro 'CHECK_OP'
#define CHECK_OP(name, op, val1, val2) CHECK_OP_LOG(name, op, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:445:48: note: expanded from macro 'CHECK_OP_LOG'
while (::tensorflow::internal::CheckOpString _result{ \
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:306:8: note: 'CheckOpString' is not literal because it is not an aggregate and has no constexpr constructors other than copy or move constructors
struct CheckOpString {
^
In file included from xla/test/cpp/torch_xla_test.cpp:10:
In file included from xla/torch_xla/csrc/helpers.h:12:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_builder.h:31:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_computation.h:22:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/shape.h:25:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/layout.h:25:
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:495:5: error: variable of non-literal type '::tensorflow::internal::CheckOpString' cannot be defined in a constexpr function
DCHECK_GE(exponent, 0);
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:472:31: note: expanded from macro 'DCHECK_GE'
#define DCHECK_GE(val1, val2) CHECK_GE(val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:459:30: note: expanded from macro 'CHECK_GE'
#define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:452:40: note: expanded from macro 'CHECK_OP'
#define CHECK_OP(name, op, val1, val2) CHECK_OP_LOG(name, op, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:445:48: note: expanded from macro 'CHECK_OP_LOG'
while (::tensorflow::internal::CheckOpString _result{ \
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:306:8: note: 'CheckOpString' is not literal because it is not an aggregate and has no constexpr constructors other than copy or move constructors
struct CheckOpString {
^
In file included from xla/test/cpp/torch_xla_test.cpp:10:
In file included from xla/torch_xla/csrc/helpers.h:12:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_builder.h:31:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_computation.h:22:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/shape.h:25:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/layout.h:25:
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:568:12: error: no matching function for call to 'LsbMask'
return LsbMask<uint64_t>(bits);
^~~~~~~~~~~~~~~~~
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:448:20: note: candidate template ignored: substitution failure [with T = unsigned long]
constexpr inline T LsbMask(int width)
^
In file included from xla/test/cpp/test_op_by_op_executor.cpp:3:
In file included from xla/test/cpp/cpp_test_util.h:12:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/xla_client/computation_client.h:13:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_computation.h:22:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/shape.h:25:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/layout.h:25:
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:454:5: error: variable of non-literal type '::tensorflow::internal::CheckOpString' cannot be defined in a constexpr function
DCHECK_GE(width, 0) << "Unsupported width " << width;
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:472:31: note: expanded from macro 'DCHECK_GE'
#define DCHECK_GE(val1, val2) CHECK_GE(val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:459:30: note: expanded from macro 'CHECK_GE'
#define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:452:40: note: expanded from macro 'CHECK_OP'
#define CHECK_OP(name, op, val1, val2) CHECK_OP_LOG(name, op, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:445:48: note: expanded from macro 'CHECK_OP_LOG'
while (::tensorflow::internal::CheckOpString _result{ \
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:306:8: note: 'CheckOpString' is not literal because it is not an aggregate and has no constexpr constructors other than copy or move constructors
struct CheckOpString {
^
In file included from xla/test/cpp/test_op_by_op_executor.cpp:3:
In file included from xla/test/cpp/cpp_test_util.h:12:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/xla_client/computation_client.h:13:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_computation.h:22:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/shape.h:25:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/layout.h:25:
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:495:5: error: variable of non-literal type '::tensorflow::internal::CheckOpString' cannot be defined in a constexpr function
DCHECK_GE(exponent, 0);
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:472:31: note: expanded from macro 'DCHECK_GE'
#define DCHECK_GE(val1, val2) CHECK_GE(val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:459:30: note: expanded from macro 'CHECK_GE'
#define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:452:40: note: expanded from macro 'CHECK_OP'
#define CHECK_OP(name, op, val1, val2) CHECK_OP_LOG(name, op, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:445:48: note: expanded from macro 'CHECK_OP_LOG'
while (::tensorflow::internal::CheckOpString _result{ \
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:306:8: note: 'CheckOpString' is not literal because it is not an aggregate and has no constexpr constructors other than copy or move constructors
struct CheckOpString {
^
In file included from xla/test/cpp/test_op_by_op_executor.cpp:3:
In file included from xla/test/cpp/cpp_test_util.h:12:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/xla_client/computation_client.h:13:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_computation.h:22:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/shape.h:25:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/layout.h:25:
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:568:12: error: no matching function for call to 'LsbMask'
return LsbMask<uint64_t>(bits);
^~~~~~~~~~~~~~~~~
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:448:20: note: candidate template ignored: substitution failure [with T = unsigned long]
constexpr inline T LsbMask(int width)
^
In file included from xla/test/cpp/test_tensor.cpp:7:
In file included from xla/test/cpp/cpp_test_util.h:12:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/xla_client/computation_client.h:13:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_computation.h:22:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/shape.h:25:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/layout.h:25:
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:454:5: error: variable of non-literal type '::tensorflow::internal::CheckOpString' cannot be defined in a constexpr function
DCHECK_GE(width, 0) << "Unsupported width " << width;
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:472:31: note: expanded from macro 'DCHECK_GE'
#define DCHECK_GE(val1, val2) CHECK_GE(val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:459:30: note: expanded from macro 'CHECK_GE'
#define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:452:40: note: expanded from macro 'CHECK_OP'
#define CHECK_OP(name, op, val1, val2) CHECK_OP_LOG(name, op, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:445:48: note: expanded from macro 'CHECK_OP_LOG'
while (::tensorflow::internal::CheckOpString _result{ \
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:306:8: note: 'CheckOpString' is not literal because it is not an aggregate and has no constexpr constructors other than copy or move constructors
struct CheckOpString {
^
In file included from xla/test/cpp/test_tensor.cpp:7:
In file included from xla/test/cpp/cpp_test_util.h:12:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/xla_client/computation_client.h:13:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_computation.h:22:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/shape.h:25:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/layout.h:25:
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:495:5: error: variable of non-literal type '::tensorflow::internal::CheckOpString' cannot be defined in a constexpr function
DCHECK_GE(exponent, 0);
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:472:31: note: expanded from macro 'DCHECK_GE'
#define DCHECK_GE(val1, val2) CHECK_GE(val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:459:30: note: expanded from macro 'CHECK_GE'
#define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:452:40: note: expanded from macro 'CHECK_OP'
#define CHECK_OP(name, op, val1, val2) CHECK_OP_LOG(name, op, val1, val2)
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:445:48: note: expanded from macro 'CHECK_OP_LOG'
while (::tensorflow::internal::CheckOpString _result{ \
^
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/core/platform/default/logging.h:306:8: note: 'CheckOpString' is not literal because it is not an aggregate and has no constexpr constructors other than copy or move constructors
struct CheckOpString {
^
In file included from xla/test/cpp/test_tensor.cpp:7:
In file included from xla/test/cpp/cpp_test_util.h:12:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/xla_client/computation_client.h:13:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/client/xla_computation.h:22:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/shape.h:25:
In file included from xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/layout.h:25:
xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/util.h:568:12: error: no matching function for call to 'LsbMask'
Just FYI, I've been observing the LsbMask
error for a while and haven't been able to resolve it. FWIW, I noticed that another user has opened a related ticket in tensorflow -- https://github.com/tensorflow/tensorflow/issues/56021. On the issue, if you click expand, you'll see the error that this user is experiencing is:
In file included from ./tensorflow/compiler/xla/array.h:35:
./tensorflow/compiler/xla/util.h:568:12: error: no matching function for call to 'LsbMask'
return LsbMask<uint64_t>(bits);
^~~~~~~~~~~~~~~~~
./tensorflow/compiler/xla/util.h:448:20: note: candidate template ignored: substitution failure [with T = unsigned long long]
constexpr inline T LsbMask(int width)
^
3 errors generated.
LsbMask
is the error I see locally
Thanks @wonjoolee95. I included the tensorflow ticket as a blocker to this issue.
The ticket remains open as the issue is blocked on the tensorflow ticket at the moment. Reopening.
got this error again, is there any workround ?
You can build with BUILD_CPP_TESTS=0
to get around this issue.
@JackCaoG any more insight on this? opened a new issue https://github.com/tensorflow/tensorflow/issues/56430
We don't run into the issue if we build pt/xla test on a newer docker images we published, but it still fails in my local environment. This seems to be a build system issue that is tricky to resolve.
Interesting. I wonder if reverting the contexpr changes can fix this... Waiting for the tensorflow people to weigh in...
Thanks a lot, I made sure to link this issue in the new tf issue
Local build passed for me if I set DEBUG=0
which sets the flag in https://github.com/pytorch/xla/blob/master/setup.py#L330
@JackCaoG looks like your setup worked on docker images; correct?
FWIW, for me, the issue persists outside of a docker image when DEBUG=0
.
What compiler version are you both using?
According to the response from upstream, https://github.com/tensorflow/tensorflow/issues/56430#issuecomment-1161289033, it may be a version issue. I am trying that now in conda-forge and it seems to be working (so far). We used to get failures around 1 hour into the CI run, but it's been more than 2 hours now without failure...
edit: it is still failing, but still tinkering with versions to see if we could resolve it.
FYI, resolved by https://github.com/tensorflow/tensorflow/commit/bc4521dd193290f86bd5de8a56cefbcbfeae3213
Thanks @ngam for the heads up, I will try to test after we update our tf pins.
Update: I am able to build the test on a brand new docker image with DEBUG=0
on python=3.7