ball icon indicating copy to clipboard operation
ball copied to clipboard

AmberFF_test fails intermittently

Open dstoeckel opened this issue 8 years ago • 4 comments

The AmberFF_test sometimes fails with the following error message (the numbers are always the same):

checking [EXTRA] Additivity of energies w/ selection... 
    (line 511 TEST_REAL_EQUAL(r4_r1 - r4_i, r1_r4 - r1_i): got -145.471, expected -103.957)  - 
    (line 513 TEST_REAL_EQUAL(r1_r4 - r1_i + r1_tpl + r4_tpl + tpl_i, total_energy): got 1680.36, expected 1638.84)  - 
    FAILED

This hints at some non-determinism in the Amber code, that might be triggered by something like pointers in a HashMap, etc.

Tested on Ubuntu 14.04.

dstoeckel avatar Apr 28 '16 15:04 dstoeckel

I had the same problem yesterday under Gentoo (same numbers):

checking [EXTRA] Additivity of energies w/ selection... 
    (line 511 TEST_REAL_EQUAL(r4_r1 - r4_i, r1_r4 - r1_i): got -145.471, expected -103.957)  - 
    (line 513 TEST_REAL_EQUAL(r1_r4 - r1_i + r1_tpl + r4_tpl + tpl_i, total_energy): got 1680.36, expected 1638.84)  - 
    FAILED
FAILED

So far, this only happened once and only with my current working copy of the master branch (incremental build processes), not with clean builds. The latter might be a coincidence, though.

tkemmer avatar Apr 28 '16 16:04 tkemmer

I think this is a rather bad one – not sure yet, but I believe that it is an effect of the setup-handling. Selection is handled differently based on whether it was switched on before or after the call to setup. Calling select before setup will remove all non-selected atoms from the force field entirely. Calling it after setup will compute the energies and forces only on selected atoms, but based also on non-selected atoms. This mechanism is, unfortunately, based on time stamps. If the clock rewinds after calling setup, the logic screws up. We need to fix this, but I don’t think it is a regression, so should not block 1.5.

Von: Thomas Kemmer <[email protected]mailto:[email protected]> Antworten an: BALL-Project/ball <[email protected]mailto:[email protected]> Datum: Donnerstag, 28. April 2016 um 18:02 An: BALL-Project/ball <[email protected]mailto:[email protected]> Betreff: Re: [BALL-Project/ball] AmberFF_test fails intermittently (#584)

I had the same problem yesterday under Gentoo (same numbers):

checking [EXTRA] Additivity of energies w/ selection... (line 511 TEST_REAL_EQUAL(r4_r1 - r4_i, r1_r4 - r1_i): got -145.471, expected -103.957) - (line 513 TEST_REAL_EQUAL(r1_r4 - r1_i + r1_tpl + r4_tpl + tpl_i, total_energy): got 1680.36, expected 1638.84) - FAILED FAILED

So far, this only happened once and only with my current working copy of the master branch (incremental build processes), not with clean builds. The latter might be a coincidence, though.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/BALL-Project/ball/issues/584#issuecomment-215477776

anhi avatar Apr 29 '16 06:04 anhi

Nevertheless we should probably fix the test. I am not sure if this is an issue on the CI servers, but I imagine that randomly failing builds are no fun.

If it is really a timestamp issue we could add short sleeps between select and setup to decrease the likelyhood of things breaking. In Amber we could check the timestamps for non-monotonicity and at least issue a warning.

dstoeckel avatar Apr 29 '16 08:04 dstoeckel

According to Daniels suggestion I added a fix that reduces the number of test failures roughly about ten times (d5c8f851ca175f082617e2dcc2f54bf3ec78a7c5).

philthiel avatar Jan 20 '17 13:01 philthiel