phylanx icon indicating copy to clipboard operation
phylanx copied to clipboard

Many buildbot tests failing on POWER8

Open khuck opened this issue 6 years ago • 17 comments

Many tests failing on POWER8. For example:

	  8 - tests.regressions.execution_tree.assign_wrong_variable_245 (Failed)
	 30 - tests.regressions.python.python.522_multidimension_np_arrays (Timeout)
	 34 - tests.regressions.python.python.635_arange_args (Failed)
	 35 - tests.regressions.python.python.636_boolean_array (Failed)
	 58 - tests.unit.algorithm.simple_lra (Failed)
	 69 - tests.unit.primitives.advanced_boolean_slicing (Failed)
	 72 - tests.unit.primitives.broadcast (Failed)
	 81 - tests.unit.ir.node_data (Failed)
	 85 - tests.unit.plugins.arithmetics.cumprod (Failed)
	 86 - tests.unit.plugins.arithmetics.cumsum (Failed)
	 89 - tests.unit.plugins.arithmetics.maximum (Failed)
	 90 - tests.unit.plugins.arithmetics.minimum (Failed)
	 94 - tests.unit.plugins.booleans.and_operation (Failed)
	 95 - tests.unit.plugins.booleans.equal_operation (Failed)
	 96 - tests.unit.plugins.booleans.greater_equal_operation (Failed)
	 97 - tests.unit.plugins.booleans.greater_operation (Failed)
	 98 - tests.unit.plugins.booleans.less_equal_operation (Failed)
	 99 - tests.unit.plugins.booleans.less_operation (Failed)
	100 - tests.unit.plugins.booleans.not_equal_operation (Failed)
	101 - tests.unit.plugins.booleans.nonzero_operation (Failed)
	102 - tests.unit.plugins.booleans.or_operation (Failed)
	103 - tests.unit.plugins.booleans.unary_not_operation (Failed)
	104 - tests.unit.plugins.booleans.where_operation (Failed)
	109 - tests.unit.plugins.controls.for_operation (Failed)
	122 - tests.unit.plugins.matrixops.arange (Failed)
	125 - tests.unit.plugins.matrixops.clip (Failed)
	151 - tests.unit.plugins.matrixops.random_distributions (Failed)
	165 - tests.unit.plugins.solvers.decomposition (Failed)
	167 - tests.unit.plugins.statistics.all_operation (Failed)
	168 - tests.unit.plugins.statistics.any_operation (Failed)
	187 - tests.unit.python.execution_tree.make_array (Timeout)
	194 - tests.unit.python.execution_tree.slice (Failed)
	195 - tests.unit.python.primitives.lambda (Failed)
	198 - tests.unit.python.primitives.numpy_dtype (Failed)

See: http://ktau.nic.uoregon.edu:8020/#/builders/2/builds/399 http://ktau.nic.uoregon.edu:8020/#/builders/4/builds/161 http://ktau.nic.uoregon.edu:8020/#/builders/9/builds/139

khuck avatar Feb 21 '19 21:02 khuck

From what I see, those are mostly std::bad_alloc exception being thrown. This should be easy enough to find.

hkaiser avatar Feb 21 '19 22:02 hkaiser

Would it make sense to build them with the system allocator instead of TCMalloc?

khuck avatar Feb 22 '19 16:02 khuck

Would it make sense to build them with the system allocator instead of TCMalloc?

@khuck that would be worth a try. God knows, tcmalloc might be broken on Power.

hkaiser avatar Feb 22 '19 16:02 hkaiser

@hkaiser nope, same errors with system allocator.

khuck avatar Feb 22 '19 19:02 khuck

@khuck Same game: could you get gdb stack backtraces for the std::bad_alloc that is thrown?

hkaiser avatar Feb 22 '19 20:02 hkaiser

@hkaiser sorry it took so long...here's the backtrace:

[Switching to Thread 0x3fff9f7ef010 (LWP 135154)]
Catchpoint 1 (exception thrown), 0x00003fffb4fdd188 in __cxa_throw ()
   from /storage/users/khuck/src/hpx/spack/opt/spack/linux-rhel7-ppc64le/gcc-4.8.5/llvm-7.0.0-dc22vkqqj3dmm4a6dpayi5jp7btyjntn/lib/libc++abi.so.1
(gdb) bt
#0  0x00003fffb4fdd188 in __cxa_throw ()
   from /storage/users/khuck/src/hpx/spack/opt/spack/linux-rhel7-ppc64le/gcc-4.8.5/llvm-7.0.0-dc22vkqqj3dmm4a6dpayi5jp7btyjntn/lib/libc++abi.so.1
#1  0x000000001002095c in allocate_backend (size=4028, alignment=4)
    at /home/users/khuck/src/phylanx/tools/buildbot/src/blaze-head/blaze/util/Memory.h:92
#2  allocate<int> (size=1007)
    at /home/users/khuck/src/phylanx/tools/buildbot/src/blaze-head/blaze/util/Memory.h:159
#3  DynamicVector (this=<optimized out>, n=1007)
    at /home/users/khuck/src/phylanx/tools/buildbot/src/blaze-head/blaze/math/dense/DynamicVector.h:542
#4  generate<int> (this=<optimized out>, n=1007, min=<optimized out>, max=<optimized out>)
    at /home/users/khuck/src/phylanx/tools/buildbot/src/blaze-head/blaze/math/DynamicVector.h:129
#5  test_unary_not_operation_1d ()
    at /home/users/khuck/src/phylanx/tests/unit/plugins/booleans/unary_not_operation.cpp:88
#6  0x000000001002b6f8 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/users/khuck/src/phylanx/tests/unit/plugins/booleans/unary_not_operation.cpp:285

khuck avatar Feb 22 '19 22:02 khuck

Also... the man page for posix_memalign has two possible errors, the first is more likely:

ERRORS
       EINVAL The alignment argument was not a power of two, or was not a multiple of sizeof(void *).
       ENOMEM There was insufficient memory to fulfill the allocation request.

Obviously, 4 is not a multiple of sizeof(void*) on POWER9, which is 8. Even though sizeof(int) is 4.

khuck avatar Feb 22 '19 22:02 khuck

@khuck, so it's a blaze bug as it seems to wrongly detect the alignment size. Blaze might not even support Power ;-) I'll ask

hkaiser avatar Feb 22 '19 23:02 hkaiser

@hkaiser I threw in a check to make sure alignment wasn't less than sizeof(void*). It's rebuilding...

khuck avatar Feb 22 '19 23:02 khuck

@hkaiser Wouldn't this error happen on any 64bit system? At least those where sizeof(int) == 4 and sizeof(void*) == 8?

khuck avatar Feb 22 '19 23:02 khuck

@khuck Blaze determines the alignment for a given type T from std::alignment_of<T>::value (https://bitbucket.org/blaze-lib/blaze/src/b742cdd522a0bf0c4c578bdf2ec3a7d5f40ec910/blaze/util/typetraits/AlignmentOf.h?at=master&fileviewer=file-view-default#AlignmentOf.h-67), which looks like the right thing to do.

The failing allocation tries to allocate std::uint8_ts (see: https://github.com/STEllAR-GROUP/phylanx/blob/master/tests/unit/plugins/booleans/unary_not_operation.cpp#L88), so having it being aligned at four byte boundaries doesn't seem to be wrong either.

hkaiser avatar Feb 22 '19 23:02 hkaiser

@khuck could you set a breakpoint at the line in the test (see above) and see what arguments the posix_memalign (https://bitbucket.org/blaze-lib/blaze/src/b742cdd522a0bf0c4c578bdf2ec3a7d5f40ec910/blaze/util/Memory.h?at=master&fileviewer=file-view-default#Memory.h-90) is invoked with?

hkaiser avatar Feb 22 '19 23:02 hkaiser

@hkaiser I already did - they are called with alignment = 4, and size = 4028 - same as what is passed to allocate_backend().

That's why I was saying that the call to posix_memalign() will fail, because 4 < 8. The alignment can't be less than sizeof(void*). From the man page:

The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *).

khuck avatar Feb 22 '19 23:02 khuck

@khuck so the minimal alignment on Power is 8, no matter what? Why does std::alignment_of<T>::value return 4, then? This doesn't make sense. Or the left hand (Posix) doesn't know what the right hand does (C++ library). If that's the case, then this is a bug in the C++ library implementation on Power.

hkaiser avatar Feb 22 '19 23:02 hkaiser

With this test program:

#include<iostream>

int main (int argc, char** argv) {
    std::cout << "Size of void* : " << sizeof(void*) << std::endl;
    std::cout << "Size of int : " << sizeof(int) << std::endl;
    std::cout << "Size of int* : " << sizeof(int*) << std::endl;
    std::cout << "Size of std::alignment_of<int>::value : " << std::alignment_of<int>::value << std::endl;
    std::cout << "Size of std::alignment_of<int>() : " << std::alignment_of<int>() << std::endl;
}

I compile with Clang++ 7 on POWER, and I get:

Size of void* : 8
Size of int : 4
Size of int* : 8
Size of std::alignment_of<int>::value : 4
Size of std::alignment_of<int>() : 4

And compiled with g++ 7 on x86_64, I get the same output:

Size of void* : 8
Size of int : 4
Size of int* : 8
Size of std::alignment_of<int>::value : 4
Size of std::alignment_of<int>() : 4

khuck avatar Feb 22 '19 23:02 khuck

@hkaiser submitted to Blaze: https://bitbucket.org/blaze-lib/blaze/issues/232/runtime-error-on-ibm-power

khuck avatar Feb 22 '19 23:02 khuck

FWIW, here is @khuck's patch (blaze/util/Memory.h at line 91):

 79 inline byte_t* allocate_backend( size_t size, size_t alignment )
 80 {
 81    void* raw( nullptr );
 82 
 83 #if BLAZE_WIN64_PLATFORM || BLAZE_MINGW64_PLATFORM
 84    raw = _aligned_malloc( size, alignment );
 85    if( raw == nullptr ) {
 86 #elif BLAZE_MINGW32_PLATFORM
 87    raw = __mingw_aligned_malloc( size, alignment );
 88    if( raw == nullptr ) {
 89 #else
 90    // make sure alignment is not less than sizeof(void*), should be made specific to Power
 91    alignment = (alignment < sizeof(void*) ? sizeof(void*) : alignment);
 92    if( posix_memalign( &raw, alignment, size ) ) {
 93 #endif 
 94       BLAZE_THROW_BAD_ALLOC;
 95    }  
 96    
 97    return reinterpret_cast<byte_t*>( raw );
 98 }  

hkaiser avatar Feb 23 '19 01:02 hkaiser