Nexus: Fix precision issues in structure manipulation and testing + Unpin numpy version in macos CI
Proposed changes
Now that numpy2 is supported, ensure we test it
What type(s) of changes does this code introduce?
- Testing changes (e.g. new unit/integration/performance tests)
Does this introduce a breaking change?
- No
What systems has this change been tested on?
None. Need to verify success in mac CI.
Checklist
Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.
- Yes. This PR is up to date with current the current state of 'develop'
- NA. Code added or changed in the PR has been clang-formatted
- NA. This PR adds tests to cover any new code, or to catch a bug that is being fixed
- NA. Documentation has been added (if appropriate)
Interestingly ntest_nexus_structure fails. We need to do something about the error verbosity:
734/1011 Test #1860: ntest_nexus_structure .............................................................................***Failed 0.88 sec
Test name : structure
Test sublabel : test_min_image_distances
Test exception: "AssertionError: "
Test backtrace:
File "/Users/runner/work/qmcpack/qmcpack/nexus/bin/nxs-test", line 478, in run
self.operation()
File "/Users/runner/work/qmcpack/qmcpack/nexus/bin/nxs-test", line 870, in structure
nunit('min_image_distances')
File "/Users/runner/work/qmcpack/qmcpack/nexus/bin/nxs-test", line 349, in nunit
run_external_unit_test(test_name,unit_test)
File "/Users/runner/work/qmcpack/qmcpack/nexus/bin/nxs-test", line 388, in run_external_unit_test
unit_test()
File "/Users/runner/work/qmcpack/qmcpack/nexus/tests/unit/test_structure.py", line 1306, in test_min_image_distances
assert(set(nti)==set(nti_ref))
^^^^^^^^^^^^^^^^^^^^^^
Test status: fail
SInce 2.2.3 failed, trying 2.2.2 since that is what is in the nightlies.
Update: Not yet successful in reproducing. mac + close approximation to versions in CI installed via brew gave a passing test.
Tried rerunning. Still broken. There is clearly a mac specific difference to solve
735/1011 Test #1882: ntest_nexus_structure .............................................................................***Failed 0.82 sec
Test name : structure
Test sublabel : test_min_image_distances
Test exception: "AssertionError: "
Test backtrace:
File "/Users/runner/work/qmcpack/qmcpack/nexus/bin/nxs-test", line 478, in run
self.operation()
File "/Users/runner/work/qmcpack/qmcpack/nexus/bin/nxs-test", line 870, in structure
nunit('min_image_distances')
File "/Users/runner/work/qmcpack/qmcpack/nexus/bin/nxs-test", line 349, in nunit
run_external_unit_test(test_name,unit_test)
File "/Users/runner/work/qmcpack/qmcpack/nexus/bin/nxs-test", line 388, in run_external_unit_test
unit_test()
File "/Users/runner/work/qmcpack/qmcpack/nexus/tests/unit/test_structure.py", line 1306, in test_min_image_distances
assert(set(nti)==set(nti_ref))
^^^^^^^^^^^^^^^^^^^^^^
Seeing if 2.3.0 &/or if any of a @Kcorbyerd 's fixes might have improved this as a side effect.
=> ntest_nexus_structure still breaks on macos with homebrew deps & numpy 2.3.0 vs 1.26.4
After a lot of digging over the weekend, I think I've finally isolated the issue down to a single function in the code, specifically a dot product in structure.py inside the function Structure.recenter(). The exact issue is in a different output on MacOS versus Linux for a dot product involving floats.
Reproducible Example
import numpy as np
print("NumPy Version: ", np.__version__)
np.set_printoptions(precision=20, floatmode="fixed")
pos_i = np.array([0., 0., 0.], dtype=np.float64)
c = np.array([0.61550000000000071321, 1.06607727205864510900, 7.50000000000000000000], dtype=np.float64)
axinv = np.array(
[
[ 4.06173842404549123586e-01, 0.00000000000000000000e+00, 0.00000000000000000000e+00],
[ 2.34504577250050944004e-01, 4.69009154500102054541e-01, 0.00000000000000000000e+00],
[-4.97419495998112408112e-17, -4.97419495998112408112e-17, 6.66666666666666657415e-02],
], dtype=np.float64
)
dot = np.dot(pos_i-c,axinv)
print(dot)
Linux Output
NumPy Version: 2.3.4
[-0.50000000000000011102 -0.50000000000000011102 -0.50000000000000000000]
MacOS Output
NumPy Version: 2.3.4
[-0.50000000000000000000 -0.50000000000000011102 -0.50000000000000000000]
Result
The result from this discrepancy is magnified because on the next line after the dot product takes place there is a numpy.floor() operation:
u = dot(pos[i]-c,axinv)
pos[i] = dot(u-np.floor(u+.5),axes)+c
and the result is, on Linux an array of [-1, -1, 0], while on Mac it is [0, -1, 0].
Interestingly, there is no structural difference between the graphene sheets produced, simply the numbering has changed. Here I've taken the outputs from both Linux and MacOS and plotted both the unit cells and the atomic positions, then colored the atoms by their atomic index (essentially their line number in an XYZ file).
Linux Graphene
MacOS Graphene
This is only a problem insofar as the ordering of the nearest neighbor list and the order of the neighbors in the list. The actual numeric values of the nearest neighbors are the same. See here the two lists of the nearest neighbors side by side.
Reference MacOS
[ 1 11 9] [ 7 15 9]
[30 0 24] [ 2 24 26]
[ 3 11 13] [ 1 9 11]
[ 2 26 24] [ 4 26 28]
[ 5 15 13] [ 3 11 13]
[ 4 26 28] [ 6 28 30]
[ 7 9 15] [ 5 15 13]
[ 6 30 28] [ 0 30 24]
[ 9 19 17] [15 17 23]
[ 8 6 0] [10 2 0]
[11 19 21] [ 9 17 19]
[10 0 2] [12 2 4]
[13 23 21] [11 19 21]
[12 2 4] [14 6 4]
[15 17 23] [13 23 21]
[14 4 6] [ 8 6 0]
[17 27 25] [23 31 25]
[16 8 14] [18 8 10]
[19 27 29] [17 27 25]
[18 10 8] [20 12 10]
[21 29 31] [19 29 27]
[20 12 10] [22 14 12]
[23 25 31] [21 29 31]
[22 12 14] [16 14 8]
[25 1 3] [31 1 7]
[24 22 16] [26 18 16]
[27 3 5] [25 3 1]
[26 18 16] [28 18 20]
[29 5 7] [27 5 3]
[28 18 20] [30 22 20]
[31 1 7] [29 5 7]
[30 22 20] [24 16 22]
If we visually inspect these entries, we see that for the first Atom in the Reference list, its nearest neighbors are Atoms 1, 11, and 9. Likewise, for the third Atom in the MacOS list, the nearest neighbors are Atoms 1, 9, and 11.
The XYZ coordinates of the first Atom in the reference and the third Atom in the MacOS list from the output of the Nexus test are identical. Therefore, the test should still pass.
Current Test
As I discussed earlier today with @prckent, the solution here is to simply rewrite the test in Nexus to be doubly unordered. Currently, the code is:
g = generate_structure(
structure = 'graphene',
cell = 'prim',
tiling = (4,4,1),
)
# Get the neighbor (index) table, along with sorted distance
# and displacement tables.
nt,dt,vt = g.neighbor_table(distances=True,vectors=True)
# Restrict to 3 nearest neighbors (not including self at index 0)
nt = nt[:,1:4]
dt = dt[:,1:4]
vt = vt[:,1:4]
nt_ref = np.array([
[ 1, 11, 9],
[30, 0, 24],
[ 3, 11, 13],
[ 2, 26, 24],
[ 5, 15, 13],
[ 4, 26, 28],
[ 7, 9, 15],
[ 6, 30, 28],
[ 9, 19, 17],
[ 8, 6, 0],
[11, 19, 21],
[10, 0, 2],
[13, 23, 21],
[12, 2, 4],
[15, 17, 23],
[14, 4, 6],
[17, 27, 25],
[16, 8, 14],
[19, 27, 29],
[18, 10, 8],
[21, 29, 31],
[20, 12, 10],
[23, 25, 31],
[22, 12, 14],
[25, 1, 3],
[24, 22, 16],
[27, 3, 5],
[26, 18, 16],
[29, 5, 7],
[28, 18, 20],
[31, 1, 7],
[30, 22, 20]])
for nti,nti_ref in zip(nt,nt_ref):
assert(set(nti)==set(nti_ref))
#end for
which allows for the elements in each slot of the array to be ambiguously ordered (i.e. as long as the first entry contains 1, 9, and 11, it is valid). The failing of the test is currently that the order of the atomic positions in the list should be identical in the reference and the calculated structure, however this fails to catch when the ordering shifts (in this case due to the NumPy bug).
Proposed Test
I propose that the test should instead be something like this:
ref_set = []
for nti_ref in nt_ref:
ref_set.append(set(nti_ref))
tested = []
for nti in nt:
assert(set(nti) in ref_set and set(nti) not in tested)
tested.append(set(nti))
Preliminary Validation of Proposed Test
import numpy as np
nt_ref = np.array(
[
[ 1, 11, 9],
[30, 0, 24],
[ 3, 11, 13],
[ 2, 26, 24],
[ 5, 15, 13],
[ 4, 26, 28],
[ 7, 9, 15],
[ 6, 30, 28],
[ 9, 19, 17],
[ 8, 6, 0],
[11, 19, 21],
[10, 0, 2],
[13, 23, 21],
[12, 2, 4],
[15, 17, 23],
[14, 4, 6],
[17, 27, 25],
[16, 8, 14],
[19, 27, 29],
[18, 10, 8],
[21, 29, 31],
[20, 12, 10],
[23, 25, 31],
[22, 12, 14],
[25, 1, 3],
[24, 22, 16],
[27, 3, 5],
[26, 18, 16],
[29, 5, 7],
[28, 18, 20],
[31, 1, 7],
[30, 22, 20],
]
)
nt = np.array(
[
[ 7, 15, 9],
[ 2, 24, 26],
[ 1, 9, 11],
[ 4, 26, 28],
[ 3, 11, 13],
[ 6, 28, 30],
[ 5, 15, 13],
[ 0, 30, 24],
[15, 17, 23],
[10, 2, 0],
[ 9, 17, 19],
[12, 2, 4],
[11, 19, 21],
[14, 6, 4],
[13, 23, 21],
[ 8, 6, 0],
[23, 31, 25],
[18, 8, 10],
[17, 27, 25],
[20, 12, 10],
[19, 29, 27],
[22, 14, 12],
[21, 29, 31],
[16, 14, 8],
[31, 1, 7],
[26, 18, 16],
[25, 3, 1],
[28, 18, 20],
[27, 5, 3],
[30, 22, 20],
[29, 5, 7],
[24, 16, 22],
]
)
ref_set = []
for nti_ref in nt_ref:
ref_set.append(set(nti_ref))
tested = []
for nti in nt:
assert(set(nti) in ref_set and set(nti) not in tested)
tested.append(set(nti))
This code, as tested with Python 3.13.1 and NumPy 2.2.1, passes when all of the values of nt are in nt_ref, and fails if there are duplicate entries. This should protect against errors that duplicate atoms, and should ensure that the test proceeds cleanly even if there is a shift in the index of the atomic positions.
The Larger Issue
Now, even though the testing will pass with the newest version of NumPy on MacOS, it still doesn't address the larger error that arises from the discrepancy between the dot product that produced the error in the first place. I think that there is at least a way to address it for the function that currently has varied behavior, Structure.recenter(), which is to revisit the use of numpy.floor() in the function. I think that numpy.floor() was likely chosen since it is a bit more predictable than numpy.round() (specifically because numpy.round() obeys the IEEE754 standard, such that values ending in .5 are rounded to the nearest even number, instead of the conventional rounding up for positive numbers or down for negative numbers), but in order to avoid this sort of error in the future I suggest that every instance of numpy.floor() or numpy.ceil() have its contents rounded to above the limits of 64-bit floating point precision (~1.11e-16).
In practice, this would look something like changing numpy.floor(x) to numpy.floor(numpy.round(x, 14)), which you can see implemented here:
import numpy as np
np.set_printoptions(precision=20, floatmode="fixed")
pos_i = np.array([0., 0., 0.], dtype=np.float64)
c = np.array([0.61550000000000071321, 1.06607727205864510900, 7.50000000000000000000], dtype=np.float64)
axinv = np.array(
[
[ 4.06173842404549123586e-01, 0.00000000000000000000e+00, 0.00000000000000000000e+00],
[ 2.34504577250050944004e-01, 4.69009154500102054541e-01, 0.00000000000000000000e+00],
[-4.97419495998112408112e-17, -4.97419495998112408112e-17, 6.66666666666666657415e-02],
], dtype=np.float64
)
dot = np.dot(pos_i-c,axinv)
dot_plus_point_five = dot+.5
floor = np.floor(dot+.5)
floor_rounded = np.floor(np.round(dot+.5, 14))
print("Result from Dot Product: ", dot)
print("Dot Plus 0.5: ", dot_plus_point_five)
print("Rounded Floor Operation: ", floor_rounded)
print("Regular Floor Operation: ", floor)
Result from Dot Product: [-0.50000000000000011102 -0.50000000000000011102 -0.50000000000000000000]
Dot Plus .5: [-1.11022302462515654042e-16 -1.11022302462515654042e-16 0.00000000000000000000e+00]
Regular Floor Operation: [-1.00000000000000000000 -1.00000000000000000000 0.00000000000000000000]
Rounded Floor Operation: [-0.00000000000000000000 -0.00000000000000000000 0.00000000000000000000]
In this case, what is likely the desired output is indeed achieved with the rounding. It should be noted however that if the true intention is to actually use numpy.floor() to take values on the order of 2^-53 (64 bit float precision) and return the floor, that this is obviously not the correct operation. I expect a bit of debate about the limits of precision here, since it is perhaps possible that a user may wish to retain precision at that low of a level, however I am not aware of any such use cases in either computational chemistry or condensed matter physics, so naturally I think this is an unlikely situation in, if not all, then nearly all possible use cases of Nexus.
I'm happy to get feedback on this, and I expect some valuable insight from @jtkrogel and/or @ye-luo that can push us in the direction of a proper solution if this isn't already sufficient.
Many thanks for identifying the culprits on this many months old problem. As we discussed yesterday there are two issues:
First, the behavior of numpy is not exactly the same between different versions on different platforms. AFAIK they don't have a policy on bitwise reproducibility so whether this was ever safe to assume is an open question.
Second, the algorithms implemented in the Nexus testing checking and in the functions tested here appear to assume identical results to machine precision across different platforms and software versions, which is not safe. Instead the algorithms should be updated to be robust and correct for results varying by at least (say) 10 epsilon. The test case is troublesome because the graphene structure is highly symmetric.
While the first issue with numpy is unfortunate, the correct solution is clearly to update our tests &/or the algorithms. The round() solution looks more elegant but either route can work and will solve this issue on mac and perhaps other platforms. I suggest to check a few more cases to verify it is behaving as desired for e.g. commonly occurring fractions. Key thing in both cases is to add a clear comment in the source explaining why something is being done.
Something your testing also highlighted is the need for more verbose error checking.
I think a good solution is to both update the tests and to update the code with np.round() since that provides a level of redundancy to the code.
I tested the rounding, and it even fixes the issue with the graphene sheet's numbering not starting at the origin:
The issue that I'll be tackling later tonight or tomorrow is with all of the other tests that relied on the previous behavior that is technically "incorrect" like test_embed in test_structure.py
Okay, so I believe I've fixed all the tests that failed with the new changes. As a brief recap of the changes I made to the testing, I've changed two tests in test_structure.py: test_min_image_distances and test_embed.
test_min_image_distances
This test is to check the ability of Nexus to enforce the minimum image convention, and it checks this by looking at the calculated nearest neighbor of every given atom in a tiled graphene sheet. For atoms not on the edge of the unit cell, this is a relatively simple check, as it just compares each of the neighboring atoms to see which has the smallest Euclidean distance to that parent atom. For atoms on the edge of the unit cell however, it cannot simply check the smallest Euclidean distance because while the single unit cell might only show 1 or 2 nearest neighbors, the infinitely tiled crystal or sheet can potentially show that there are other atoms that, when tiled, appear close to the parent atom. The minimum image convention is a method by which the atoms on the edge of the unit cell are translated by some multiple of the cell dimension, and then these new atomic positions are used to check the nearest neighbors. A visual example is provided further below.
Original Graphene Sheet Reference
Original Nearest Neighbor Reference Table (nt_ref)
nt_ref = np.array([
[ 1, 11, 9],
[30, 0, 24],
[ 3, 11, 13],
[ 2, 26, 24],
[ 5, 15, 13],
[ 4, 26, 28],
[ 7, 9, 15],
[ 6, 30, 28],
[ 9, 19, 17],
[ 8, 6, 0],
[11, 19, 21],
[10, 0, 2],
[13, 23, 21],
[12, 2, 4],
[15, 17, 23],
[14, 4, 6],
[17, 27, 25],
[16, 8, 14],
[19, 27, 29],
[18, 10, 8],
[21, 29, 31],
[20, 12, 10],
[23, 25, 31],
[22, 12, 14],
[25, 1, 3],
[24, 22, 16],
[27, 3, 5],
[26, 18, 16],
[29, 5, 7],
[28, 18, 20],
[31, 1, 7],
[30, 22, 20]])
New Graphene Sheet Reference
New Nearest Neighbor Reference Table (nt_ref)
nt_ref = np.array([
[ 1, 7, 31],
[ 0, 2, 10],
[ 1, 3, 25],
[ 2, 4, 12],
[ 3, 5, 27],
[ 4, 6, 14],
[ 5, 7, 29],
[ 0, 6, 8],
[ 7, 9, 15],
[ 8, 10, 18],
[ 1, 9, 11],
[10, 12, 20],
[ 3, 11, 13],
[12, 14, 22],
[ 5, 13, 15],
[ 8, 14, 16],
[15, 17, 23],
[16, 18, 26],
[ 9, 17, 19],
[18, 20, 28],
[11, 19, 21],
[20, 22, 30],
[13, 21, 23],
[16, 22, 24],
[23, 25, 31],
[ 2, 24, 26],
[17, 25, 27],
[ 4, 26, 28],
[19, 27, 29],
[ 6, 28, 30],
[21, 29, 31],
[ 0, 24, 30],
])
Checking the new nearest neighbor table
To ensure that there were no problems with the new checks and that the new reference table was indeed correct, I quickly went through the graphene sheet and visually inspected the atoms, looking for the nearest neighbors. I've provided a colorized and labeled image of the sheet for anyone who would like to double check my work.
test_embed
This test is for embedding a defect in a larger supercell, and has 2 portions of the test that would fail if not changed. The first is to create a local distortion in the lattice, then to embed that local distortion into a larger, pristine cell.
Original Distortion in Small Lattice (gr, gr.pos, gr.axes)
Original Distortion Embedded in Large Lattice (ge, ge.pos, ge.axes)
New Distortion in Small Lattice (gr, gr.pos, gr.axes)
New Distortion Embedded in Large Lattice (ge, ge.pos, ge.axes)
Change made to test
Naturally, I changed the indices of the atoms in the distance calculations (np.linalg.norm()) to reflect the new, properly centered structure.
Original
rnn_max_ref = 2.1076122431022664
rnn_max = np.linalg.norm(gr.pos[1]-gr.pos[30])
assert(value_eq(rnn_max,rnn_max_ref))
rnn_max = np.linalg.norm(ge.pos[1]-ge.pos[28])
assert(value_eq(rnn_max,rnn_max_ref))
New
rnn_max_ref = 2.1076122431022664
rnn_max = np.linalg.norm(gr.pos[0]-gr.pos[1])
assert(value_eq(rnn_max,rnn_max_ref))
rnn_max = np.linalg.norm(ge.pos[0]-ge.pos[1])
assert(value_eq(rnn_max,rnn_max_ref))
Test this please
Thanks for the full fix, Brock. FYI, someone else will need to review + approve before merge since this is built on my original PR.
Will review in a bit
After discussion with Brock (@Kcorbyerd), the test failures appear to stem from deeper inconsistencies in the behavior of the recenter function. The behavior of recenter will be rigorously nailed down first, then the issues presented here will be re-approached.