RajalakshmiSR

Results 11 comments of RajalakshmiSR

For smaller n values(matrix), performance is good with OMP_PLACES=threads(160) than specifying places in OMP_PLACES and for larger n value specifying places gives better numbers. n = 512: ~$ OMP_NUM_THREADS=160 OMP_PLACES="threads(160)"...

Specifying GEMM_PREFERRED_SIZE in param.h for ppc does not make any difference for the above testcase. However I will check using GEMM_PREFERRED_SIZE in general for common usecase and add it for...

@Flamefire Do you have any openblas test to recreate this? @rafaelcfsousa tried to recreate this with PyTorch tests , but the tests are passing on POWER10. Can you also share...

OpenBLAS 0.3.7 is old and test_conv_large_cpu seems to work now with newer versions. ``` test_conv_double_backward_no_bias_cpu (__main__.TestNNDeviceTypeCPU) ... ok test_conv_double_backward_stride_cpu (__main__.TestNNDeviceTypeCPU) ... ok test_conv_double_backward_strided_with_3D_input_and_weight_cpu (__main__.TestNNDeviceTypeCPU) ... ok test_Conv2d_groups (__main__.TestNN) ... ok...

Looks like this is happening on POWER9 after https://github.com/flame/blis/commit/ee9ff988c49f16696679d4c6cd3dcfcac7295be7 This is the stack trace : Program received signal SIGSEGV, Segmentation fault. bli_sgemmtrsmbb_l_power9_ref (k=4, alpha=0x10, a1x=0x64, a11=0x7fffffffccf0, bx1=0x7ffff6ef10c0, b11=0x7ffff6ef1700, c11=0x7fffebd51098, rs_c=140737150003608,...

Thanks for fixing this. `make check` now shows only these failures on POWER9. Filename: out.zblat3 ******* FATAL ERROR - PARAMETER NUMBER 11 WAS CHANGED INCORRECTLY ******* ******* ZHEMM FAILED ON...

> @RajalakshmiSR there might be a bug where data is written off the end of the output matrix. Please try the following test program: > > It should print: >...

> @RajalakshmiSR Maybe try compiling with `-fstack-protector-strong` when using GCC > 7. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799 which may be the issue. But just a wild guess. Still the same issue. Tried it...

@mmatti-sw Can you help to rebase?

Thanks @martin-frbg for the suggestions. If we find a way to do this, can we include that in openblas with some flags?