sisl
sisl copied to clipboard
WIP: Linear and efficient neighbour finder
This PR implements a linear scaling algorithm to find neighbours.
The algorithm
It is mostly inspired by the method described here. I don't know if it is the same way SIESTA does it.
Basically you divide the whole unit cell into 3D bins and you "store" each atom in the corresponding bin. Each bin is as big as the maximum distance that you have to look for neighbours. E.g. if the maximum radius is 1.5 Ang, then the bins need to be of size 3 Ang (6 Ang if you are looking for sphere overlaps.). That ensures that when you look for the neighbors of a given atom/position you only need to go one bin away in each direction. Therefore, you just need to explore 2**3 = 8
bins for each atom/position instead of all the atoms in the structure.
The concept is the same as what ASE uses in their neighbour list (https://wiki.fysik.dtu.dk/ase/ase/neighborlist.html). The implementation is quite different though and you will see this in the benchmarks (below).
How to use it
Currently, you have to import the class NeighFinder
:
from sisl.geom import NeighFinder
Then instantiate it with a geometry, which will do the organization of the bins. The parameters R
and sphere_overlap
need to be provided here as well, since the size of the bins depends on them.
geom = sisl.geom.graphene_nanoribbon(5).tile(3, 0)
finder = NeighFinder(geom, R=1.5, sphere_overlap=False)
Once instantiated, you will be able to query the neighbour finder as much as you want. Although the most efficient way to query it is all at once. There are two methods to find neighbours: find_neighbours
and find_all_unique_pairs
.
The first one accepts atoms
as an argument to find only the neighbours for a subset.
finder.find_neighbours(atoms=[10, 13])
[array([12, 8, 13], dtype=int32), array([16, 11, 10], dtype=int32)]
You can also choose to return the neighbours as pairs if this is the way you need them, which is faster (splitting to get the above result is actually just an extra internal step):
finder.find_neighbours(atoms=[10,13], as_pairs=True)
array([[10, 12],
[10, 8],
[10, 13],
[13, 16],
[13, 11],
[13, 10]], dtype=int32)
Also, if you want just all unique pairs of neighbouring atoms you can call find_all_unique_pairs
, which will save time and memory.
finder.find_all_unique_pairs()
array([[ 0, 2],
[ 0, 3],
[ 1, 4],
[ 1, 3],
[ 2, 5],
[ 3, 6],
[ 4, 7],
[ 5, 8],
[ 6, 9],
[ 6, 8],
[ 7, 9],
[ 8, 10],
[ 9, 11],
[10, 12],
[10, 13],
[11, 14],
[11, 13],
[12, 15],
[13, 16],
[14, 17],
[15, 18],
[16, 19],
[16, 18],
[17, 19],
[18, 20],
[19, 21],
[20, 22],
[20, 23],
[21, 24],
[21, 23],
[22, 25],
[23, 26],
[24, 27],
[25, 28],
[26, 29],
[26, 28],
[27, 29]], dtype=int32)
Benchmarks
Here I benchmark the implementation against ASE
and a trivial close
search.
The quadratic
search is just a loop like:
def quadratic(geom):
neighs = []
for at in geom:
neighs.append(geom.close(at, R=1.5))
The ASE
call looks like:
from ase.neighborlist import neighbor_list
neighbours = neighbor_list("ij", geom.to.ase(), 1.5, self_interaction=False)
and this implementation call is:
from sisl.geom import NeighFinder
neighbours = NeighFinder(geom, R=1.5, sphere_overlap=False).find_neighbours(None, self_interaction=False, ...)
# or the same but calling find_all_unique_pairs
See the full script
from ase.neighborlist import neighbor_list
from sisl.geom import NeighFinder, fcc, graphene_nanoribbon
import timeit
from collections import defaultdict
def quadratic(geom):
neighs = []
for at in geom:
neighs.append(geom.close(at, R=1.5))
what = "fcc"
times = defaultdict(list)
na = []
# Iterate over multiple tiles
for tiles in range(1, 10, 1):
print(tiles)
if what == "fcc":
geom = fcc(1.5, "C").tile(3*tiles, 1).tile(3*tiles, 2).tile(3*tiles, 0)
elif what == "ribbon":
geom = graphene_nanoribbon(9).tile(tiles, 0)
na.append(geom.na)
# Compute with ASE
ase_time = timeit.timeit(lambda: neighbor_list("ij", geom.to.ase(), 1.5, self_interaction=False), number=1)
times["ASE"].append(ase_time)
# Compute with this implementation (different variants)
my_time = timeit.timeit(lambda: NeighFinder(geom, R=1.5).find_neighbours(None, as_pairs=True), number=1)
times["THIS (PAIRS)"].append(my_time)
my_time = timeit.timeit(lambda: NeighFinder(geom, R=1.5).find_neighbours(None, as_pairs=False), number=1)
times["THIS (NEIGHS)"].append(my_time)
my_time = timeit.timeit(lambda: NeighFinder(geom, R=1.5).find_all_unique_pairs(), number=1)
times["THIS (UNIQUE PAIRS)"].append(my_time)
# Compute with quadratic search
quadratic_time = timeit.timeit(lambda: quadratic(geom), number=1)
times["QUADRATIC"].append(quadratic_time)
# Plotting
import plotly.graph_objs as go
fig = go.Figure()
for key, time in times.items():
fig.add_scatter(
x=na, y=time, name=key
)
fig.update_layout(
xaxis_title="Number of atoms",
yaxis_title="Time (s)",
title=f"Finding neighbours on {what}",
yaxis_showgrid=True,
xaxis_showgrid=True
)
First, I compare the calculation of neighbors for an fcc
system, which I tile uniformly in all directions
Against quadratic implementation:
From as little as 40 atoms, this implementation is faster. After that, the quadratic implementation blows up. This is even more critical if the system has periodic boundary conditions. With nsc=[3,3,3]
:
Note that the implementation in this PR only needs to calculate some offsets if there are pbc and we are at the border of the cell, but still the search is only performed on 8 bins.
Note also that this implementation can scale to huge systems:
It works even better in sparse systems, where the number of candidate neighbors for each atom is much lower since some bins don't contain atoms:
Now, comparing with ASE
, I have found that their algorithm scales pretty bad with the cell size:
This is (I think), because they store the location of each atom in a dense array of shape (nbins, max_atoms_per_bin)
. I followed the implementation described here. The usage of memory seems to be much more optimized. In fact, I have checked that this implementation almost keeps constant time if we increase the cell while keeping the number of atoms constant, while ASE
's scales linearly with it:
Work to do
The current implementation only finds neighbors in the unit cell. As I mentioned before, one only needs to apply offsets if need it. I already compute the supercell indices of the bins that are searched, so it shouldn't be much of a problem.
However, I have some doubts:
- When should we look for neighbors in neighboring cells? In principle, periodic conditions in sisl are indicated by
nsc > 1
. However, the user may not know beforehand if there are inter-cell interactions for a given radius. Should we requirensc
to be 3 anyway? - Should we return atom indices in supercell indices? If so, that depends on
nsc
. So the user would need to set nsc themselves or receive the new values ofnsc
for them to use the indices properly.
Also if the neighbor search spans more than 1 neighboring cell (nsc > 3
), I will make it compute neighbors with the quadratic algorithm.
Cheers!
could you just share your benchmark script? For completeness sake ;)
Done! I updated the comment, it is hidden in the "See the full script" collapsible at the top of the benchmarks.
Just a comment. In your ase timing, I think you are also timing the conversion of the sisl object to the ase object. Try and move this conversion out and see if this changes the picture. I get quite good timings with ase all across.
Hmm I moved the conversion out of the timing and I get almost the same. Could you share which systems are you testing? When I began testing, I only tiled the fcc
in one direction and that gave similar timings between ASE and this implementation. But when I started tiling all dimensions the ASE
implementation started to give very high timings.
Hmm I moved the conversion out of the timing and I get almost the same. Could you share which systems are you testing? When I began testing, I only tiled the fcc in one direction and that gave similar timings between ASE and this implementation. But when I started tiling all dimensions the ASE implementation started to give very high timings.
As I said, I think the problem is that the ASE
implementation scales with cell size (see time vs cell size in the benchmarks)
Hmm I moved the conversion out of the timing and I get almost the same. Could you share which systems are you testing? When I began testing, I only tiled the
fcc
in one direction and that gave similar timings between ASE and this implementation. But when I started tiling all dimensions theASE
implementation started to give very high timings.
I fucked it up ;) Nevertheless moving it outside is appropriate :)
I incorporated the distance filtering inside the fortran loop (previously it was done a posteriori in python) and now it is even more efficient. I kept the old ones to compare for now (dashed lines in this plot):
Now one can scale up the tiled fcc even more (compare with previous fcc plot which showed up to 1.6M atoms):
In fact I could keep making it larger if it wasn't because it results in an integer overflow :sweat_smile:
This is amazing!! I'll have a look asap, but I'll try and get #388 done first.
Ok! I'm going to keep optimizing some things now that I know everything works fine :)
I added (optional) periodic boundary conditions. For now, what is done is to return the supercell indices of the neighbors (in three extra columns).
Also, I added a find_close
method to find neighbors of space coordinates instead of atoms. I guess it can be useful for grids (?).
This is almost ready, the only two things that need to be done are:
- Implement the edge case of needing a supercell bigger than 3 in some direction.
- Decide about the API:
- Should we merge all the functionality in a single method?
- Probably add option to return bond length directly and coordinate shifts instead of supercell indices.
- Should this method replace the
Geometry.find_close
method?
Thanks, I still plan on getting to this very interesting and useful thing! :)
I would however, probably change it to a Cython code, since fortran does not play nice with C-arrays (it may incur copies that are not necessary. So I first need to understand what you are doing, and then subsequently change it a bit. :)
Isn't there a way to check if copies are made? I tried to do it and didn't get any warning, but probably it is just because I did it wrong :sweat_smile:
Isn't there a way to check if copies are made? I tried to do it and didn't get any warning, but probably it is just because I did it wrong sweat_smile
pip3 install . --f2py-report-copy
should suffice, however, I would still like to use C instead of fortran. Simply because of compatibility issues going forward. numpy distutils will be discontinued in 3.12, and then I need to change to pythran
(most likely). However, Cython won't require anything. So keeping it like this would be ideal.
I'm trying to move this to cython :+1:
Codecov Report
Attention: 208 lines
in your changes are missing coverage. Please review.
Comparison is base (
09d1388
) 86.14% compared to head (7f9b4dc
) 85.78%. Report is 7 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #393 +/- ##
==========================================
- Coverage 86.14% 85.78% -0.37%
==========================================
Files 364 367 +3
Lines 43511 43904 +393
==========================================
+ Hits 37483 37662 +179
- Misses 6028 6242 +214
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I have a Cython question. Imagine I have the geometry's coordinates in an array called xyz
, and for a given atom I want to store the three components in another array called say atom_xyz
. Doing this:
cdef np.ndarray[np.float_64, ndim=2] xyz
cdef np.ndarray[np.float_64] atom_xyz
atom_xyz[:] = xyz[3, :]
Cython shows python interaction and it slows down the code. Is there any way of avoiding the loop:
for i in range(3):
atom_xyz[i] = xyz[3, i]
without hurting performance?
I have a Cython question. Imagine I have the geometry's coordinates in an array called
xyz
, and for a given atom I want to store the three components in another array called sayatom_xyz
. Doing this:cdef np.ndarray[np.float_64, ndim=2] xyz cdef np.ndarray[np.float_64] atom_xyz atom_xyz[:] = xyz[3, :]
Cython shows python interaction and it slows down the code. Is there any way of avoiding the loop:
for i in range(3): atom_xyz[i] = xyz[3, i]
without hurting performance?
You should generally avoid using np.ndarray
in cython code, rather you should use memory views. And if they are temporary arrays use an allocated array.
Also, I guess you mean:
atom_xyz[:] = xyz[ia, :]
no?
also, try and use cython --annotate
that will give you a html page with some indications on where you can optimize ;)
Yes, I was already doing that! I got that there was python interaction in this part, but I didn't know that with a memoryview you could do this. I've changed it now and all good!
Ok, everything is working fine and as fast as in fortran now for the find_neighbours
method :)
I just have to do the same for the rest of methods and that would be it. This afternoon I'll work on it and the PR will be ready to discuss the details and merge! Any particular suggestion?
Great job! I'll probably take over and clean things up, also so I know what's happening, I want to learn this one ;)
Just let me know when I should take over! :0
Ok, you can look at it now, everything is implemented :+1:
Ok some comments:
- when you overwrite
nsc
in the geometry, you'll also change the supercell indices it returns, this will break in some cases, it needs to be fixed in another way - Why do you only need to look vertically and horizontally for neighbouring bins? What about the diagonals? I.e. an atom at the bin corner could connect to an atom along the diagonal? So effectively one should look in
D**3
bins, no? Perhaps your tests does not catch these corner cases? - Would it make sense to allow lower dimensionality searches, e.g. projection all atoms onto a plane and search only in this plane? It seems trivially generalized.
I am still working on it, so don't push anything ;)
-
Can you make me remember which part of the code do you mean? I can't recall. I don't return converted atomic indices and the supercell indices as I use them and return them have three components (x,y,z), so I don't understand how this could get affected (?)
-
I do check the diagonals. For each atom/position I create a cube of 8 bins with potential neighbors (in
get_search_indices
). Could you draw a sketch of which bins I'm missing? I can't see it. -
Off the top of my head I don't know if this is easily generalizable, because depending on the direction of the plane, the created grid of bins would be useless. So perhaps it would be easier to create a new projected geometry and run the algorithm with that one. Which cases do you have in mind?
- Ok, that would explain it, I'll have a deeper look.
- See here https://www.mdpi.com/1424-8220/21/24/8241?type=check_update&version=1 in that picture you are only taking the red blocks, however, a neighbouring atom could be in the blue or green boxes?
- I have in mind 2D materials with huge vacuum spaces, or perhaps the internal algorithm could cut the non-periodic directions to max-min of coordinates along those directions.
-
No ok, this is the difference of what I implemented vs what I had seen others do. The bins here are (at least) double the size of the maximum distance to look for. Then, for each given point/atom I not only find which bin is it in, but also which side of the bin is it in. This allows me to look only to one side along each direction. Which means it's only 8 bins (
2**3
) that I have to search through, instead of 27 (3**3
) as in the case where you have to look to both sides. -
Yes, I had already thought about this! Although the vacuum has to be really huge for it to be a problem. One more bin means just one more item in the
heads
list and one more item in thecounts
list, so it is not really that important in terms of memory. Also dismissing an empty bin during the search happens really fast. Not filling the whole cell with bins would make you implement special cases for when you look for neighbors of a point that is in vacuum. All in all I think it's not worth it(?)
- Great, clever ;)
- Well, consider a system of 500Ang^3 with R =1.42, thats going to occupy 41 MB for a single of these arrays. Not so important for DFT sized systems, but for TB?
How many of those bins would be useless though? I imagine you would not have a 500 Ang vacuum 😅
For 2D systems you could specify the max number of bins on each direction (and you would specify 1 in the vacuum one). However this would not be as useful for slabs. I think we could keep it as is (allowing the max_bins
parameter) and then if someone has memory problems we can change the implementation. However even for that huge system 41MB seems fine to handle, doesn't it?