BLAS-Tester icon indicating copy to clipboard operation
BLAS-Tester copied to clipboard

Not passing in some tests

Open griloHBG opened this issue 7 years ago • 2 comments

Hi!

I just compiled OpenBLAS and it's not passing in 2 tests of the BLAS-Tester

This is the Makefile.rule used to compile OpenBLAS

(my processor is a i5 3330 with 1024MB of L2 cache):

#
#  Beginning of user configuration
#

# This library's version
VERSION = 0.3.0.dev

# If you set the suffix, the library name will be libopenblas_$(LIBNAMESUFFIX).a
# and libopenblas_$(LIBNAMESUFFIX).so. Meanwhile, the soname in shared library
# is libopenblas_$(LIBNAMESUFFIX).so.0.
# LIBNAMESUFFIX = omp

# You can specify the target architecture, otherwise it's
# automatically detected.
# TARGET = PENRYN

# If you want to support multiple architecture in one binary
# DYNAMIC_ARCH = 1

# C compiler including binary type(32bit / 64bit). Default is gcc.
# Don't use Intel Compiler or PGI, it won't generate right codes as I expect.
CC = gcc

# Fortran compiler. Default is g77.
FC = gfortran

# Even you can specify cross compiler. Meanwhile, please set HOSTCC.

# cross compiler for Windows
# CC = x86_64-w64-mingw32-gcc
# FC = x86_64-w64-mingw32-gfortran

# cross compiler for 32bit ARM
# CC = arm-linux-gnueabihf-gcc
# FC = arm-linux-gnueabihf-gfortran

# cross compiler for 64bit ARM
# CC = aarch64-linux-gnu-gcc
# FC = aarch64-linux-gnu-gfortran


# If you use the cross compiler, please set this host compiler.
# HOSTCC = gcc

# If you need 32bit binary, define BINARY=32, otherwise define BINARY=64
BINARY=64

# About threaded BLAS. It will be automatically detected if you don't
# specify it.
# For force setting for single threaded, specify USE_THREAD = 0
# For force setting for multi  threaded, specify USE_THREAD = 1
USE_THREAD = 1

# If you're going to use this library with OpenMP, please comment it in.
# This flag is always set for POWER8. Don't modify the flag 
USE_OPENMP = 1

# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by the the script.
# NUM_THREADS = 24

# if you don't need to install the static library, please comment it in.
# NO_STATIC = 1

# if you don't need generate the shared library, please comment it in.
# NO_SHARED = 1

# If you don't need CBLAS interface, please comment it in.
# NO_CBLAS = 1

# If you only want CBLAS interface without installing Fortran compiler,
# please comment it in.
# ONLY_CBLAS = 1

# If you don't need LAPACK, please comment it in.
# If you set NO_LAPACK=1, the library automatically sets NO_LAPACKE=1.
# NO_LAPACK = 1

# If you don't need LAPACKE (C Interface to LAPACK), please comment it in.
# NO_LAPACKE = 1

# Build LAPACK Deprecated functions since LAPACK 3.6.0
BUILD_LAPACK_DEPRECATED = 1

# Build RecursiveLAPACK on top of LAPACK
# BUILD_RELAPACK = 1

# If you want to use legacy threaded Level 3 implementation.
# USE_SIMPLE_THREADED_LEVEL3 = 1

# If you want to drive whole 64bit region by BLAS. Not all Fortran
# compiler supports this. It's safe to keep comment it out if you
# are not sure(equivalent to "-i8" option).
# INTERFACE64 = 1

# Unfortunately most of kernel won't give us high quality buffer.
# BLAS tries to find the best region before entering main function,
# but it will consume time. If you don't like it, you can disable one.
NO_WARMUP = 1

# If you want to disable CPU/Memory affinity on Linux.
#NO_AFFINITY = 1

# if you are compiling for Linux and you have more than 16 numa nodes or more than 256 cpus
# BIGNUMA = 1

# Don't use AVX kernel on Sandy Bridge. It is compatible with old compilers
# and OS. However, the performance is low.
# NO_AVX = 1

# Don't use Haswell optimizations if binutils is too old (e.g. RHEL6)
# NO_AVX2 = 1

# Don't use parallel make.
# NO_PARALLEL_MAKE = 1

# Force number of make jobs. The default is the number of logical CPU of the host.
# This is particularly useful when using distcc.
# A negative value will disable adding a -j flag to make, allowing to use a parent
# make -j value. This is useful to call OpenBLAS make from an other project
# makefile
MAKE_NB_JOBS = 5

# If you would like to know minute performance report of GotoBLAS.
# FUNCTION_PROFILE = 1

# Support for IEEE quad precision(it's *real* REAL*16)( under testing)
# QUAD_PRECISION = 1

# Theads are still working for a while after finishing BLAS operation
# to reduce thread activate/deactivate overhead. You can determine
# time out to improve performance. This number should be from 4 to 30
# which corresponds to (1 << n) cycles. For example, if you set to 26,
# thread will be running for (1 << 26) cycles(about 25ms on 3.0GHz
# system). Also you can control this mumber by THREAD_TIMEOUT
# CCOMMON_OPT	+= -DTHREAD_TIMEOUT=26

# Using special device driver for mapping physically contigous memory
# to the user space. If bigphysarea is enabled, it will use it.
# DEVICEDRIVER_ALLOCATION = 1

# If you need to synchronize FP CSR between threads (for x86/x86_64 only).
# CONSISTENT_FPCSR = 1

# If any gemm arguement m, n or k is less or equal this threshold, gemm will be execute
# with single thread. You can use this flag to avoid the overhead of multi-threading
# in small matrix sizes. The default value is 4.
# GEMM_MULTITHREAD_THRESHOLD = 4

# If you need santy check by comparing reference BLAS. It'll be very
# slow (Not implemented yet).
# SANITY_CHECK = 1

# The installation directory.
PREFIX = /c/OpenBLAS-develop/build

# Common Optimization Flag;
# The default -O2 is enough.
# Flags for POWER8 are defined in Makefile.power. Don't modify COMMON_OPT
# COMMON_OPT = -O2

# gfortran option for LAPACK
# enable this flag only on 64bit Linux and if you need a thread safe lapack library
# Flags for POWER8 are defined in Makefile.power. Don't modify FCOMMON_OPT
# FCOMMON_OPT = -frecursive

# Profiling flags
COMMON_PROF = -pg

# Build Debug version
# DEBUG = 1

# Set maximum stack allocation.
# The default value is 2048. 0 disable stack allocation a may reduce GER and GEMV
# performance. For details, https://github.com/xianyi/OpenBLAS/pull/482
#
# MAX_STACK_ALLOC = 0

# Add a prefix or suffix to all exported symbol names in the shared library.
# Avoid conflicts with other BLAS libraries, especially when using
# 64 bit integer interfaces in OpenBLAS.
# For details, https://github.com/xianyi/OpenBLAS/pull/459
#
# The same prefix and suffix are also added to the library name,
# i.e. you get lib$(SYMBOLPREFIX)openblas$(SYMBOLSUFFIX) rather than libopenblas
#
# SYMBOLPREFIX=
# SYMBOLSUFFIX=

#
#  End of user configuration
#

This is the Makefile.rule used to compile BLAS-Tester:

#
# configuration
#

#
# Default compiler
# Supports gcc & icc.
#
CC = gcc
# CC = icc
FC = gfortran

#
# CPU architecture
#
# For i386 and x86-64 (default)
ARCH = X86
#
# For Loongson CPU
# ARCH = loongson
# 
# For SW1600 CPU
# ARCH = sw1600
#
# For ARM
# ARCH = ARM
#
# For ARM 64-bit
# ARCH = ARM64

#
# define BINARY=32 or BINARY=64
#
BINARY=64


# the path to BLAS library
#
TEST_BLAS = /c/OpenBLAS-develop/build/lib/libopenblas.a

# The size of Level 2 Cache (default = 4M)
# 4M=4194304 6M=6291456 8M=8388608 12M=12582912
# 1M=1048576
L2SIZE = 1048576

# The number of threads (default = 1)
#
NUMTHREADS = 4

# Use OPENMP
#
USE_OPENMP = 1

# BLAS interface is compiled by ifort
#
# F_INTERFACE_INTEL = 1

# Use 64-bit int
#
# INTERFACE64 = 1

# Debug the library
#
DEBUG = 1

# Reference Level-3 BLAS is very slow. If you only want test 
# the performance, please use this flag.  
#
ONLY_PERFORMANCE=1

# Don't test i?amin level 1 BLAS function.
# If you test ATLAS, please use this flag
#
# NO_EXTENSION=1 

# To test BLAS invalid reading.
#
#
TEST_INVALID_READ=1

And these are the outputs from xdl3blastst.exe and xzl3blastst.exe, in which I didn't pass:

xdl3blastst.exe

--------------------------------- GEMM ----------------------------------
TST# A B    M    N    K ALPHA  LDA  LDB  BETA  LDC  TIME MFLOP SpUp  TEST
==== = = ==== ==== ==== ===== ==== ==== ===== ==== ===== ===== ==== =====
   0 N N  100  100  100   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   0 N N  100  100  100   1.0 1000 1000   1.0 1000  0.00 2000.1 0.00 FAIL
   1 N N  200  200  200   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   1 N N  200  200  200   1.0 1000 1000   1.0 1000  0.00 16001.2 0.00 FAIL
   2 N N  300  300  300   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   2 N N  300  300  300   1.0 1000 1000   1.0 1000  0.00 26998.7 0.00 FAIL
   3 N N  400  400  400   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   3 N N  400  400  400   1.0 1000 1000   1.0 1000  0.00 64004.6 0.00 FAIL
   4 N N  500  500  500   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   4 N N  500  500  500   1.0 1000 1000   1.0 1000  0.01 25000.0 0.00 FAIL
   5 N N  600  600  600   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   5 N N  600  600  600   1.0 1000 1000   1.0 1000  0.02 28799.8 0.00 FAIL
   6 N N  700  700  700   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   6 N N  700  700  700   1.0 1000 1000   1.0 1000  0.02 36101.1 0.00 FAIL
   7 N N  800  800  800   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   7 N N  800  800  800   1.0 1000 1000   1.0 1000  0.02 44519.4 0.00 FAIL
   8 N N  900  900  900   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   8 N N  900  900  900   1.0 1000 1000   1.0 1000  0.04 40497.8 0.00 FAIL
   9 N N 1000 1000 1000   1.0 1000 1000   1.0 1000  0.00   0.0 1.00 -----
   9 N N 1000 1000 1000   1.0 1000 1000   1.0 1000  0.04 55552.2 0.00 FAIL

10 tests run, 0 passed

ERROR:  resid=16594586171.078035, normD=89.889952, normA=28.793478, normB=29.226439, normC=28.989082, eps=2.220446e-015
   resid=16594586171.078035
ERROR:  resid=2773246963.051268, normD=218.792316, normA=56.642434, normB=57.305220, normC=54.731481, eps=2.220446e-015
   resid=2773246963.051268
ERROR:  resid=1094383692.194749, normD=403.595201, normA=81.700942, normB=82.799372, normC=81.839090, eps=2.220446e-015
   resid=1094383692.194749
ERROR:  resid=534562438.250687, normD=596.196173, normA=108.791732, normB=107.565588, normC=107.305327, eps=2.220446e-015
   resid=534562438.250687
ERROR:  resid=305866010.093331, normD=816.988601, normA=135.098887, normB=134.129053, normC=132.769989, eps=2.220446e-015
   resid=305866010.093331
ERROR:  resid=192610482.239102, normD=1069.112441, normA=161.428768, normB=160.691640, normC=160.611920, eps=2.220446e-015
   resid=192610482.239102
ERROR:  resid=138021619.603977, normD=1367.737912, normA=185.091087, normB=185.631790, normC=185.557983, eps=2.220446e-015
   resid=138021619.603977
ERROR:  resid=96740043.055165, normD=1660.811029, normA=213.176325, normB=211.591894, normC=214.262286, eps=2.220446e-015
   resid=96740043.055165
ERROR:  resid=74284120.110187, normD=1991.189904, normA=235.355207, normB=238.370610, normC=239.087816, eps=2.220446e-015
   resid=74284120.110187
ERROR:  resid=56641560.947299, normD=2339.706092, normA=263.700784, normB=267.147553, normC=264.072542, eps=2.220446e-015
   resid=56641560.947299

xzl3blastst.exe

----------------------------------- GEMM ---------------------------------------
TST# A B    M    N    K     ALPHA  LDA  LDB      BETA  LDC TIME MFLOP SpUp  TEST
==== = = ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ===== ==== =====
   0 N N  100  100  100  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   0 N N  100  100  100  1.0  0.0 1000 1000  1.0  0.0 1000  0.0 8000.6 0.00 FAIL
   1 N N  200  200  200  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   1 N N  200  200  200  1.0  0.0 1000 1000  1.0  0.0 1000  0.0 64004.6 0.00 FAIL
   2 N N  300  300  300  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   2 N N  300  300  300  1.0  0.0 1000 1000  1.0  0.0 1000  0.0 35999.7 0.00 FAIL
   3 N N  400  400  400  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   3 N N  400  400  400  1.0  0.0 1000 1000  1.0  0.0 1000  0.0 39381.7 0.00 FAIL
   4 N N  500  500  500  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   4 N N  500  500  500  1.0  0.0 1000 1000  1.0  0.0 1000  0.0 43474.2 0.00 FAIL
   5 N N  600  600  600  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   5 N N  600  600  600  1.0  0.0 1000 1000  1.0  0.0 1000  0.0 40183.4 0.00 FAIL
   6 N N  700  700  700  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   6 N N  700  700  700  1.0  0.0 1000 1000  1.0  0.0 1000  0.1 43553.6 0.00 FAIL
   7 N N  800  800  800  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   7 N N  800  800  800  1.0  0.0 1000 1000  1.0  0.0 1000  0.2 23674.9 0.00 FAIL
   8 N N  900  900  900  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   8 N N  900  900  900  1.0  0.0 1000 1000  1.0  0.0 1000  0.1 43520.1 0.00 FAIL
   9 N N 1000 1000 1000  1.0  0.0 1000 1000  1.0  0.0 1000  0.0   0.0 1.00 -----
   9 N N 1000 1000 1000  1.0  0.0 1000 1000  1.0  0.0 1000  0.2 43475.9 0.00 FAIL

10 tests run, 0 passed

ERROR:  resid=5808263227.808679, normD=216.237078, normA=55.385994, normB=54.860139, normC=55.180648, eps=2.220446e-015
   resid=5808263227.808679
ERROR:  resid=1074809828.113166, normD=601.797181, normA=106.491473, normB=109.570072, normC=108.054166, eps=2.220446e-015
   resid=1074809828.113166
ERROR:  resid=396457559.182390, normD=1083.087382, normA=161.254057, normB=159.568429, normC=159.385091, eps=2.220446e-015
   resid=396457559.182390
ERROR:  resid=199769392.057709, normD=1661.614599, normA=211.307150, normB=210.324956, normC=210.715331, eps=2.220446e-015
   resid=199769392.057709
ERROR:  resid=111071418.163279, normD=2253.468650, normA=264.344989, normB=264.029114, normC=261.827991, eps=2.220446e-015
   resid=111071418.163279
ERROR:  resid=71719253.247765, normD=3004.049181, normA=315.190676, normB=316.395238, normC=315.265541, eps=2.220446e-015
   resid=71719253.247765
ERROR:  resid=48877783.458658, normD=3768.493740, normA=367.077460, normB=365.469404, normC=369.750940, eps=2.220446e-015
   resid=48877783.458658
ERROR:  resid=35328098.327874, normD=4572.585438, normA=419.474251, normB=417.001526, normC=416.551214, eps=2.220446e-015
   resid=35328098.327874
ERROR:  resid=26326291.949775, normD=5440.816380, normA=468.969008, normB=471.419747, normC=467.777930, eps=2.220446e-015
   resid=26326291.949775
ERROR:  resid=20419316.036365, normD=6425.550719, normA=520.205707, normB=521.716708, normC=522.178614, eps=2.220446e-015
   resid=20419316.036365

griloHBG avatar Nov 04 '17 12:11 griloHBG

BTW, in L2SIZE (BLAS-Tester's Makefile.rule), I also tried with 4M.

griloHBG avatar Nov 04 '17 12:11 griloHBG

Have yo solved this problem? I faced the same issue when I used ONLY_PERFORMANCE=1.

xiaofengF avatar Feb 07 '20 07:02 xiaofengF