horton icon indicating copy to clipboard operation
horton copied to clipboard

32bit rounding still failing

Open matt-chan opened this issue 8 years ago • 7 comments

So we're still getting issues with numerical instability when we go to 32bit rounding behaviour.

For example, these builds fail: https://copr.fedorainfracloud.org/coprs/talcite/Horton-2.0.0/build/380108/

Did we ever merge in our FPU fix @tovrstra ?

matt-chan avatar Jul 12 '16 13:07 matt-chan

We kicked out of again for some reason. Don't remember why.

tovrstra avatar Jul 12 '16 14:07 tovrstra

Was it too ugly? I remember that it had to be called before we started anything. Maybe the more permanent fix is to just loosen the tolerances on the failing tests and just accept the fact that i686 and x86_64 rounding will be different...

Another option is to import fpufix.h (I'm pretty sure that's what it was called) on the cpp files that do math. I think numpy has consistent rounding behaviour on 32 and 64bit. We don't actually have that many cpp files.

As it stands though, it's a bit of a problem for fedora packaging, since i686 will be supported for a while longer, even if they don't classify it as primary architecture anymore.

matt-chan avatar Jul 14 '16 14:07 matt-chan

I vaguely remember that fpufix just broke other tests. (I'll try to look it up again.)

I'm a bit surprised that it is still a problem. I ran all unit tests of the 2.0.1 release on a fedora 23 32-bit virtual machine without any problems.

I tried the link above but I can't find a list of tests that are failing. Can you give me a direct link to the output that shows which unit tests break?

tovrstra avatar Jul 14 '16 20:07 tovrstra

It was once in the code and I found the reason for removing it again: it broke tests on Debian 32 bit.

Also, fpufix cannot be applied locally to just some files. It applies to the entire process. If you want to use it, it should be called as early as possible to avoid inconsistencies.

Can you give instructions for reproducing the problem locally, e.g. in a virtual machine?

tovrstra avatar Jul 14 '16 21:07 tovrstra

Yes, a 32bit virtual machine should do it. I think that's how we tested it last time.

Matt

On Thu, 14 Jul 2016 at 23:16 Toon Verstraelen [email protected] wrote:

I found the reason for it: it broke tests on Debian 32 bit.

Also, fpufix cannot be applied locally to just some files. It applies to the entire process. If you want to use it, it should be called as early as possible to avoid inconsistencies.

Can you give instructions for reproducing the problem locally, e.g. in a virtual machine?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/theochem/horton/issues/145#issuecomment-232794831, or mute the thread https://github.com/notifications/unsubscribe/AA_-NSVTuA00Y9eu9VSODWDMo4oduw2eks5qVqcKgaJpZM4JKYSn .

matt-chan avatar Jul 14 '16 21:07 matt-chan

These tests fail on Fedora 24 32 bit:

======================================================================
FAIL: horton.gbasis.test.test_boys.test_boys_array
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/toon/.local/lib/python2.7/site-packages/horton/gbasis/test/test_boys.py", line 526, in test_boys_array
    assert output[m] == boys_function(m, t)
AssertionError

======================================================================
FAIL: horton.gbasis.test.test_ints.test_ralpha_repulsion_4_3_2_1
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/toon/.local/lib/python2.7/site-packages/horton/gbasis/test/test_ints.py", line 2823, in test_ralpha_repulsion_4_3_2_1
    result0, -1.0)
  File "/home/toon/.local/lib/python2.7/site-packages/horton/gbasis/test/test_ints.py", line 2497, in check_ralpha_repulsion
    assert abs(result1 - result0).max() < 3e-7
AssertionError

tovrstra avatar Sep 21 '16 13:09 tovrstra

The Boys function (first failing test above) is compared to full precision, which should not be done. It may fail on some machines because the two implementations that are compared, have different orders of operations. Differences in rounding will indeed give different results.

tovrstra avatar Oct 14 '16 15:10 tovrstra