bbp icon indicating copy to clipboard operation
bbp copied to clipboard

BBToolbox unit test failure on Summit

Open scallag opened this issue 5 years ago • 1 comments

The BBToolbox unit test fails on Summit:

FAIL: test_bbtoolbox (test_bbtoolbox.TestBBToolbox)

Traceback (most recent call last): File "/gpfs/alpine/geo112/proj-shared/CyberShake/software/bbp/bbp-19.8.0-python3/bbp/tests/test_bbtoolbox.py", line 90, in test_bbtoolbox (ref_file)) AssertionError: True is not false : output HF BBP /gpfs/alpine/proj-shared/geo112/CyberShake/software/bbp/bbp_data/tmpdata/854229/854229.s02.bbp file does not match reference hf bbp file /gpfs/alpine/proj-shared/geo112/CyberShake/software/bbp/bbp-19.8.0-python3/bbp/tests/ref_data/sdsu/s02.bbp

My Summit environment is: [[email protected] tests]$ module list

Currently Loaded Modules:

  1. hsi/5.0.2.p5 6) python/3.6.6-anaconda3-5.3.0
  2. xalt/1.1.4 7) gcc/6.4.0
  3. lsf-tools/2.0 8) spectrum-mpi/10.3.0.1-20190611
  4. DefApps 9) fftw/3.3.8
  5. cuda/10.1.168

scallag avatar Feb 05 '20 17:02 scallag

Also seeing this one our Compute Canada graham cluster.

======================================================================
FAIL: test_bbtoolbox (test_bbtoolbox.TestBBToolbox)
Test SDSU BBToolbox code
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/project/6001152/bbp/bbp-22.4.0/bbp/tests/test_bbtoolbox.py", line 103, in test_bbtoolbox
    self.assertFalse(cmp_bbp.cmp_bbp(ref_file, hybfile,
AssertionError: True is not false : output HF BBP /home/tyson/projects/def-tyson/bbp/bbp_data/tmpdata/5202051/5202051.s02.bbp  file does not match reference hf bbp file /home/tyson/projects/def-tyson/bbp/bbp-22.4.0/bbp/tests/ref_data/sdsu/s02.bbp

Running the test by hand gives (the output continues until it hits the 1000 differences outputting limit)

Line 500: 0.002816 and 0.017900 differ by more than 0.010000 tolerance.
Line 501: 0.007445 and 0.042666 differ by more than 0.010000 tolerance.
Line 501: 0.001851 and -0.017066 differ by more than 0.010000 tolerance.
Line 502: 0.007916 and 0.021410 differ by more than 0.010000 tolerance.
Line 502: 0.014223 and 0.074031 differ by more than 0.010000 tolerance.
Line 502: 0.003547 and -0.028766 differ by more than 0.010000 tolerance.
...
Line 1002: 0.143870 and 1.733500 differ by more than 0.010000 tolerance.
Line 1002: 0.322270 and -10.789000 differ by more than 0.010000 tolerance.

and looking into the files (it is actually line 508 as it doesn't seem to count the comment lines at the top) shows they differ wildly after becoming non-zero at timestep 6.22367

...
  6.21117    0.00000E+00    0.00000E+00    0.00000E+00
  6.22367    0.28447E-03    0.46319E-03    0.11456E-03
  6.23617    0.16687E-02    0.28161E-02    0.69819E-03
  6.24866    0.42707E-02    0.74455E-02    0.18512E-02
...
 12.50983    0.14387E+00   -0.22468E+01    0.32227E+00
...
...
  6.21117    0.00000E+00    0.00000E+00    0.00000E+00
  6.22367    0.94099E-03    0.32967E-02   -0.13993E-02
  6.23617    0.51307E-02    0.17900E-01   -0.73738E-02
  6.24866    0.12283E-01    0.42666E-01   -0.17066E-01
...
 12.50983    0.17335E+01    0.26260E+01   -0.10789E+02
...

twhitehead avatar Feb 01 '23 15:02 twhitehead