Many buildbot tests failing on POWER8
Many tests failing on POWER8. For example:
8 - tests.regressions.execution_tree.assign_wrong_variable_245 (Failed)
30 - tests.regressions.python.python.522_multidimension_np_arrays (Timeout)
34 - tests.regressions.python.python.635_arange_args (Failed)
35 - tests.regressions.python.python.636_boolean_array (Failed)
58 - tests.unit.algorithm.simple_lra (Failed)
69 - tests.unit.primitives.advanced_boolean_slicing (Failed)
72 - tests.unit.primitives.broadcast (Failed)
81 - tests.unit.ir.node_data (Failed)
85 - tests.unit.plugins.arithmetics.cumprod (Failed)
86 - tests.unit.plugins.arithmetics.cumsum (Failed)
89 - tests.unit.plugins.arithmetics.maximum (Failed)
90 - tests.unit.plugins.arithmetics.minimum (Failed)
94 - tests.unit.plugins.booleans.and_operation (Failed)
95 - tests.unit.plugins.booleans.equal_operation (Failed)
96 - tests.unit.plugins.booleans.greater_equal_operation (Failed)
97 - tests.unit.plugins.booleans.greater_operation (Failed)
98 - tests.unit.plugins.booleans.less_equal_operation (Failed)
99 - tests.unit.plugins.booleans.less_operation (Failed)
100 - tests.unit.plugins.booleans.not_equal_operation (Failed)
101 - tests.unit.plugins.booleans.nonzero_operation (Failed)
102 - tests.unit.plugins.booleans.or_operation (Failed)
103 - tests.unit.plugins.booleans.unary_not_operation (Failed)
104 - tests.unit.plugins.booleans.where_operation (Failed)
109 - tests.unit.plugins.controls.for_operation (Failed)
122 - tests.unit.plugins.matrixops.arange (Failed)
125 - tests.unit.plugins.matrixops.clip (Failed)
151 - tests.unit.plugins.matrixops.random_distributions (Failed)
165 - tests.unit.plugins.solvers.decomposition (Failed)
167 - tests.unit.plugins.statistics.all_operation (Failed)
168 - tests.unit.plugins.statistics.any_operation (Failed)
187 - tests.unit.python.execution_tree.make_array (Timeout)
194 - tests.unit.python.execution_tree.slice (Failed)
195 - tests.unit.python.primitives.lambda (Failed)
198 - tests.unit.python.primitives.numpy_dtype (Failed)
See: http://ktau.nic.uoregon.edu:8020/#/builders/2/builds/399 http://ktau.nic.uoregon.edu:8020/#/builders/4/builds/161 http://ktau.nic.uoregon.edu:8020/#/builders/9/builds/139
From what I see, those are mostly std::bad_alloc exception being thrown. This should be easy enough to find.
Would it make sense to build them with the system allocator instead of TCMalloc?
Would it make sense to build them with the system allocator instead of TCMalloc?
@khuck that would be worth a try. God knows, tcmalloc might be broken on Power.
@hkaiser nope, same errors with system allocator.
@khuck Same game: could you get gdb stack backtraces for the std::bad_alloc that is thrown?
@hkaiser sorry it took so long...here's the backtrace:
[Switching to Thread 0x3fff9f7ef010 (LWP 135154)]
Catchpoint 1 (exception thrown), 0x00003fffb4fdd188 in __cxa_throw ()
from /storage/users/khuck/src/hpx/spack/opt/spack/linux-rhel7-ppc64le/gcc-4.8.5/llvm-7.0.0-dc22vkqqj3dmm4a6dpayi5jp7btyjntn/lib/libc++abi.so.1
(gdb) bt
#0 0x00003fffb4fdd188 in __cxa_throw ()
from /storage/users/khuck/src/hpx/spack/opt/spack/linux-rhel7-ppc64le/gcc-4.8.5/llvm-7.0.0-dc22vkqqj3dmm4a6dpayi5jp7btyjntn/lib/libc++abi.so.1
#1 0x000000001002095c in allocate_backend (size=4028, alignment=4)
at /home/users/khuck/src/phylanx/tools/buildbot/src/blaze-head/blaze/util/Memory.h:92
#2 allocate<int> (size=1007)
at /home/users/khuck/src/phylanx/tools/buildbot/src/blaze-head/blaze/util/Memory.h:159
#3 DynamicVector (this=<optimized out>, n=1007)
at /home/users/khuck/src/phylanx/tools/buildbot/src/blaze-head/blaze/math/dense/DynamicVector.h:542
#4 generate<int> (this=<optimized out>, n=1007, min=<optimized out>, max=<optimized out>)
at /home/users/khuck/src/phylanx/tools/buildbot/src/blaze-head/blaze/math/DynamicVector.h:129
#5 test_unary_not_operation_1d ()
at /home/users/khuck/src/phylanx/tests/unit/plugins/booleans/unary_not_operation.cpp:88
#6 0x000000001002b6f8 in main (argc=<optimized out>, argv=<optimized out>)
at /home/users/khuck/src/phylanx/tests/unit/plugins/booleans/unary_not_operation.cpp:285
Also... the man page for posix_memalign has two possible errors, the first is more likely:
ERRORS
EINVAL The alignment argument was not a power of two, or was not a multiple of sizeof(void *).
ENOMEM There was insufficient memory to fulfill the allocation request.
Obviously, 4 is not a multiple of sizeof(void*) on POWER9, which is 8. Even though sizeof(int) is 4.
@khuck, so it's a blaze bug as it seems to wrongly detect the alignment size. Blaze might not even support Power ;-) I'll ask
@hkaiser I threw in a check to make sure alignment wasn't less than sizeof(void*). It's rebuilding...
@hkaiser Wouldn't this error happen on any 64bit system? At least those where sizeof(int) == 4 and sizeof(void*) == 8?
@khuck Blaze determines the alignment for a given type T from std::alignment_of<T>::value (https://bitbucket.org/blaze-lib/blaze/src/b742cdd522a0bf0c4c578bdf2ec3a7d5f40ec910/blaze/util/typetraits/AlignmentOf.h?at=master&fileviewer=file-view-default#AlignmentOf.h-67), which looks like the right thing to do.
The failing allocation tries to allocate std::uint8_ts (see: https://github.com/STEllAR-GROUP/phylanx/blob/master/tests/unit/plugins/booleans/unary_not_operation.cpp#L88), so having it being aligned at four byte boundaries doesn't seem to be wrong either.
@khuck could you set a breakpoint at the line in the test (see above) and see what arguments the posix_memalign (https://bitbucket.org/blaze-lib/blaze/src/b742cdd522a0bf0c4c578bdf2ec3a7d5f40ec910/blaze/util/Memory.h?at=master&fileviewer=file-view-default#Memory.h-90) is invoked with?
@hkaiser I already did - they are called with alignment = 4, and size = 4028 - same as what is passed to allocate_backend().
That's why I was saying that the call to posix_memalign() will fail, because 4 < 8. The alignment can't be less than sizeof(void*). From the man page:
The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *).
@khuck so the minimal alignment on Power is 8, no matter what? Why does std::alignment_of<T>::value return 4, then? This doesn't make sense. Or the left hand (Posix) doesn't know what the right hand does (C++ library). If that's the case, then this is a bug in the C++ library implementation on Power.
With this test program:
#include<iostream>
int main (int argc, char** argv) {
std::cout << "Size of void* : " << sizeof(void*) << std::endl;
std::cout << "Size of int : " << sizeof(int) << std::endl;
std::cout << "Size of int* : " << sizeof(int*) << std::endl;
std::cout << "Size of std::alignment_of<int>::value : " << std::alignment_of<int>::value << std::endl;
std::cout << "Size of std::alignment_of<int>() : " << std::alignment_of<int>() << std::endl;
}
I compile with Clang++ 7 on POWER, and I get:
Size of void* : 8
Size of int : 4
Size of int* : 8
Size of std::alignment_of<int>::value : 4
Size of std::alignment_of<int>() : 4
And compiled with g++ 7 on x86_64, I get the same output:
Size of void* : 8
Size of int : 4
Size of int* : 8
Size of std::alignment_of<int>::value : 4
Size of std::alignment_of<int>() : 4
@hkaiser submitted to Blaze: https://bitbucket.org/blaze-lib/blaze/issues/232/runtime-error-on-ibm-power
FWIW, here is @khuck's patch (blaze/util/Memory.h at line 91):
79 inline byte_t* allocate_backend( size_t size, size_t alignment )
80 {
81 void* raw( nullptr );
82
83 #if BLAZE_WIN64_PLATFORM || BLAZE_MINGW64_PLATFORM
84 raw = _aligned_malloc( size, alignment );
85 if( raw == nullptr ) {
86 #elif BLAZE_MINGW32_PLATFORM
87 raw = __mingw_aligned_malloc( size, alignment );
88 if( raw == nullptr ) {
89 #else
90 // make sure alignment is not less than sizeof(void*), should be made specific to Power
91 alignment = (alignment < sizeof(void*) ? sizeof(void*) : alignment);
92 if( posix_memalign( &raw, alignment, size ) ) {
93 #endif
94 BLAZE_THROW_BAD_ALLOC;
95 }
96
97 return reinterpret_cast<byte_t*>( raw );
98 }