Albany icon indicating copy to clipboard operation
Albany copied to clipboard

green-3-20km first-order stokes performance test failing on blake failed comparison

Open jewatkins opened this issue 4 years ago • 6 comments

Failed comparison: https://sems-cdash-son.sandia.gov/cdash/testDetails.php?test=1293180&build=19222 -6.995108075090e+00 != -7.005509894455e+00

This started happening 6/8: https://sems-cdash-son.sandia.gov/cdash/testDetails.php?test=1193162&build=17539 ** Trilinos git commit id - 17eccd2 ** Albany git commit id --- 00b139c

And passed on 6/7: https://sems-cdash-son.sandia.gov/cdash/testDetails.php?test=1190960&build=17499 ** Trilinos git commit id - 08b5ee4 ** Albany git commit id --- 4612319

@mperego we spoke about this over email. You tried different machines and you were getting different results. So, it seems this test is pretty sensitive. Should we try reducing the tolerance to see if we can get consistent results?

jewatkins avatar Jul 12 '21 17:07 jewatkins

@jewatkins I'm not convinced it has to with tolerances, but it could if for some reason this peoblem is very ill-posed. I tried running this w/ gcc and I get consistent results using before and after Trilinos and Albany commits. However, gcc result is different than the Blake's ones. If tightening the tolerances (which are already rathar tight) doesn't modify the result, then we'll have to find what precise commit, either in Albany or Trilinos is creating the differece.

mperego avatar Jul 15 '21 15:07 mperego

Okay, I can look at this more later.

jewatkins avatar Jul 16 '21 00:07 jewatkins

Are there any updates on this? Seems these performance tests are the only things failing now in Albany.

ikalash avatar Aug 12 '21 02:08 ikalash

I haven't had a chance to look at this but I'd be okay with changing the test vals

jewatkins avatar Aug 12 '21 02:08 jewatkins

Ok thanks for the update @jewatkins .

ikalash avatar Aug 12 '21 17:08 ikalash

I was going to change the value but realized that would break the GPU tests because they're still showing up as -7.005509894459e+00. The old blake tests would give -7.005509894460e+00 so this seems like something we will have to spend more time on. Maybe some inconsistency between gcc and intel.

jewatkins avatar Nov 21 '21 20:11 jewatkins