sisl icon indicating copy to clipboard operation
sisl copied to clipboard

WIP: Linear and efficient neighbour finder

Open pfebrer opened this issue 3 years ago • 33 comments

This PR implements a linear scaling algorithm to find neighbours.

The algorithm

It is mostly inspired by the method described here. I don't know if it is the same way SIESTA does it.

Basically you divide the whole unit cell into 3D bins and you "store" each atom in the corresponding bin. Each bin is as big as the maximum distance that you have to look for neighbours. E.g. if the maximum radius is 1.5 Ang, then the bins need to be of size 3 Ang (6 Ang if you are looking for sphere overlaps.). That ensures that when you look for the neighbors of a given atom/position you only need to go one bin away in each direction. Therefore, you just need to explore 2**3 = 8 bins for each atom/position instead of all the atoms in the structure.

The concept is the same as what ASE uses in their neighbour list (https://wiki.fysik.dtu.dk/ase/ase/neighborlist.html). The implementation is quite different though and you will see this in the benchmarks (below).

How to use it

Currently, you have to import the class NeighFinder:

from sisl.geom import NeighFinder

Then instantiate it with a geometry, which will do the organization of the bins. The parameters R and sphere_overlap need to be provided here as well, since the size of the bins depends on them.

geom = sisl.geom.graphene_nanoribbon(5).tile(3, 0)
finder = NeighFinder(geom, R=1.5, sphere_overlap=False)

Once instantiated, you will be able to query the neighbour finder as much as you want. Although the most efficient way to query it is all at once. There are two methods to find neighbours: find_neighbours and find_all_unique_pairs.

The first one accepts atoms as an argument to find only the neighbours for a subset.

finder.find_neighbours(atoms=[10, 13])
[array([12,  8, 13], dtype=int32), array([16, 11, 10], dtype=int32)]

You can also choose to return the neighbours as pairs if this is the way you need them, which is faster (splitting to get the above result is actually just an extra internal step):

finder.find_neighbours(atoms=[10,13], as_pairs=True)
array([[10, 12],
       [10,  8],
       [10, 13],
       [13, 16],
       [13, 11],
       [13, 10]], dtype=int32)

Also, if you want just all unique pairs of neighbouring atoms you can call find_all_unique_pairs, which will save time and memory.

finder.find_all_unique_pairs()
array([[ 0,  2],
       [ 0,  3],
       [ 1,  4],
       [ 1,  3],
       [ 2,  5],
       [ 3,  6],
       [ 4,  7],
       [ 5,  8],
       [ 6,  9],
       [ 6,  8],
       [ 7,  9],
       [ 8, 10],
       [ 9, 11],
       [10, 12],
       [10, 13],
       [11, 14],
       [11, 13],
       [12, 15],
       [13, 16],
       [14, 17],
       [15, 18],
       [16, 19],
       [16, 18],
       [17, 19],
       [18, 20],
       [19, 21],
       [20, 22],
       [20, 23],
       [21, 24],
       [21, 23],
       [22, 25],
       [23, 26],
       [24, 27],
       [25, 28],
       [26, 29],
       [26, 28],
       [27, 29]], dtype=int32)

Benchmarks

Here I benchmark the implementation against ASE and a trivial close search.

The quadratic search is just a loop like:

def quadratic(geom):
    neighs = []
    for at in geom:
        neighs.append(geom.close(at, R=1.5))

The ASE call looks like:

from ase.neighborlist import neighbor_list

neighbours = neighbor_list("ij", geom.to.ase(), 1.5, self_interaction=False)

and this implementation call is:

from sisl.geom import NeighFinder

neighbours = NeighFinder(geom, R=1.5, sphere_overlap=False).find_neighbours(None, self_interaction=False, ...)
# or the same but calling find_all_unique_pairs
See the full script
from ase.neighborlist import neighbor_list
from sisl.geom import NeighFinder, fcc, graphene_nanoribbon

import timeit
from collections import defaultdict

def quadratic(geom):
    neighs = []
    for at in geom:
        neighs.append(geom.close(at, R=1.5))

what = "fcc"
times = defaultdict(list)
na = []

# Iterate over multiple tiles
for tiles in range(1, 10, 1):
    print(tiles)
    
    if what == "fcc":
        geom = fcc(1.5, "C").tile(3*tiles, 1).tile(3*tiles, 2).tile(3*tiles, 0)
    elif what == "ribbon":
        geom = graphene_nanoribbon(9).tile(tiles, 0)
    
    na.append(geom.na)
    
    # Compute with ASE
    ase_time = timeit.timeit(lambda: neighbor_list("ij", geom.to.ase(), 1.5, self_interaction=False), number=1)
    times["ASE"].append(ase_time)
    
    # Compute with this implementation (different variants)
    my_time = timeit.timeit(lambda: NeighFinder(geom, R=1.5).find_neighbours(None, as_pairs=True), number=1)
    times["THIS (PAIRS)"].append(my_time)
    
    my_time = timeit.timeit(lambda: NeighFinder(geom, R=1.5).find_neighbours(None, as_pairs=False), number=1)
    times["THIS (NEIGHS)"].append(my_time)
    
    my_time = timeit.timeit(lambda: NeighFinder(geom, R=1.5).find_all_unique_pairs(), number=1)
    times["THIS (UNIQUE PAIRS)"].append(my_time)
    
    # Compute with quadratic search
    quadratic_time = timeit.timeit(lambda: quadratic(geom), number=1)
    times["QUADRATIC"].append(quadratic_time)
    
# Plotting
import plotly.graph_objs as go

fig = go.Figure()
for key, time in times.items():
    fig.add_scatter(
        x=na, y=time, name=key
    )

fig.update_layout(
    xaxis_title="Number of atoms",
    yaxis_title="Time (s)",
    title=f"Finding neighbours on {what}",
    yaxis_showgrid=True,
    xaxis_showgrid=True
)

First, I compare the calculation of neighbors for an fcc system, which I tile uniformly in all directions

Against quadratic implementation:

fcc_vs_quad

From as little as 40 atoms, this implementation is faster. After that, the quadratic implementation blows up. This is even more critical if the system has periodic boundary conditions. With nsc=[3,3,3]:

image

Note that the implementation in this PR only needs to calculate some offsets if there are pbc and we are at the border of the cell, but still the search is only performed on 8 bins.

Note also that this implementation can scale to huge systems: image

It works even better in sparse systems, where the number of candidate neighbors for each atom is much lower since some bins don't contain atoms:

image

Now, comparing with ASE, I have found that their algorithm scales pretty bad with the cell size: image

This is (I think), because they store the location of each atom in a dense array of shape (nbins, max_atoms_per_bin). I followed the implementation described here. The usage of memory seems to be much more optimized. In fact, I have checked that this implementation almost keeps constant time if we increase the cell while keeping the number of atoms constant, while ASE's scales linearly with it:

image

Work to do

The current implementation only finds neighbors in the unit cell. As I mentioned before, one only needs to apply offsets if need it. I already compute the supercell indices of the bins that are searched, so it shouldn't be much of a problem.

However, I have some doubts:

  • When should we look for neighbors in neighboring cells? In principle, periodic conditions in sisl are indicated by nsc > 1. However, the user may not know beforehand if there are inter-cell interactions for a given radius. Should we require nsc to be 3 anyway?
  • Should we return atom indices in supercell indices? If so, that depends on nsc. So the user would need to set nsc themselves or receive the new values of nsc for them to use the indices properly.

Also if the neighbor search spans more than 1 neighboring cell (nsc > 3), I will make it compute neighbors with the quadratic algorithm.

Cheers!

pfebrer avatar Nov 02 '21 12:11 pfebrer

could you just share your benchmark script? For completeness sake ;)

zerothi avatar Nov 02 '21 12:11 zerothi

Done! I updated the comment, it is hidden in the "See the full script" collapsible at the top of the benchmarks.

pfebrer avatar Nov 02 '21 13:11 pfebrer

Just a comment. In your ase timing, I think you are also timing the conversion of the sisl object to the ase object. Try and move this conversion out and see if this changes the picture. I get quite good timings with ase all across.

zerothi avatar Nov 02 '21 14:11 zerothi

Hmm I moved the conversion out of the timing and I get almost the same. Could you share which systems are you testing? When I began testing, I only tiled the fcc in one direction and that gave similar timings between ASE and this implementation. But when I started tiling all dimensions the ASE implementation started to give very high timings.

pfebrer avatar Nov 02 '21 14:11 pfebrer

Hmm I moved the conversion out of the timing and I get almost the same. Could you share which systems are you testing? When I began testing, I only tiled the fcc in one direction and that gave similar timings between ASE and this implementation. But when I started tiling all dimensions the ASE implementation started to give very high timings.

As I said, I think the problem is that the ASE implementation scales with cell size (see time vs cell size in the benchmarks)

pfebrer avatar Nov 02 '21 14:11 pfebrer

Hmm I moved the conversion out of the timing and I get almost the same. Could you share which systems are you testing? When I began testing, I only tiled the fcc in one direction and that gave similar timings between ASE and this implementation. But when I started tiling all dimensions the ASE implementation started to give very high timings.

I fucked it up ;) Nevertheless moving it outside is appropriate :)

zerothi avatar Nov 02 '21 14:11 zerothi

I incorporated the distance filtering inside the fortran loop (previously it was done a posteriori in python) and now it is even more efficient. I kept the old ones to compare for now (dashed lines in this plot):

image

Now one can scale up the tiled fcc even more (compare with previous fcc plot which showed up to 1.6M atoms):

image

In fact I could keep making it larger if it wasn't because it results in an integer overflow :sweat_smile:

pfebrer avatar Nov 03 '21 18:11 pfebrer

This is amazing!! I'll have a look asap, but I'll try and get #388 done first.

zerothi avatar Nov 03 '21 19:11 zerothi

Ok! I'm going to keep optimizing some things now that I know everything works fine :)

pfebrer avatar Nov 03 '21 22:11 pfebrer

I added (optional) periodic boundary conditions. For now, what is done is to return the supercell indices of the neighbors (in three extra columns).

Also, I added a find_close method to find neighbors of space coordinates instead of atoms. I guess it can be useful for grids (?).

pfebrer avatar Nov 04 '21 02:11 pfebrer

This is almost ready, the only two things that need to be done are:

  • Implement the edge case of needing a supercell bigger than 3 in some direction.
  • Decide about the API:
    • Should we merge all the functionality in a single method?
    • Probably add option to return bond length directly and coordinate shifts instead of supercell indices.
    • Should this method replace the Geometry.find_close method?

pfebrer avatar Dec 14 '21 08:12 pfebrer

Thanks, I still plan on getting to this very interesting and useful thing! :)

I would however, probably change it to a Cython code, since fortran does not play nice with C-arrays (it may incur copies that are not necessary. So I first need to understand what you are doing, and then subsequently change it a bit. :)

zerothi avatar Dec 14 '21 08:12 zerothi

Isn't there a way to check if copies are made? I tried to do it and didn't get any warning, but probably it is just because I did it wrong :sweat_smile:

pfebrer avatar Dec 14 '21 08:12 pfebrer

Isn't there a way to check if copies are made? I tried to do it and didn't get any warning, but probably it is just because I did it wrong sweat_smile

pip3 install . --f2py-report-copy

should suffice, however, I would still like to use C instead of fortran. Simply because of compatibility issues going forward. numpy distutils will be discontinued in 3.12, and then I need to change to pythran (most likely). However, Cython won't require anything. So keeping it like this would be ideal.

zerothi avatar Dec 14 '21 08:12 zerothi

I'm trying to move this to cython :+1:

pfebrer avatar Feb 22 '22 16:02 pfebrer

Codecov Report

Attention: 208 lines in your changes are missing coverage. Please review.

Comparison is base (09d1388) 86.14% compared to head (7f9b4dc) 85.78%. Report is 7 commits behind head on main.

Files Patch % Lines
src/sisl/geom/_neighbors/_operations.py 0.00% 194 Missing :warning:
src/sisl/geom/_neighbors/_finder.py 93.02% 12 Missing :warning:
src/sisl/utils/misc.py 88.88% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #393      +/-   ##
==========================================
- Coverage   86.14%   85.78%   -0.37%     
==========================================
  Files         364      367       +3     
  Lines       43511    43904     +393     
==========================================
+ Hits        37483    37662     +179     
- Misses       6028     6242     +214     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 22 '22 16:02 codecov[bot]

I have a Cython question. Imagine I have the geometry's coordinates in an array called xyz, and for a given atom I want to store the three components in another array called say atom_xyz. Doing this:

cdef np.ndarray[np.float_64, ndim=2] xyz
cdef np.ndarray[np.float_64] atom_xyz

atom_xyz[:] = xyz[3, :]

Cython shows python interaction and it slows down the code. Is there any way of avoiding the loop:

for i in range(3):
    atom_xyz[i] = xyz[3, i]

without hurting performance?

pfebrer avatar Feb 23 '22 03:02 pfebrer

I have a Cython question. Imagine I have the geometry's coordinates in an array called xyz, and for a given atom I want to store the three components in another array called say atom_xyz. Doing this:

cdef np.ndarray[np.float_64, ndim=2] xyz
cdef np.ndarray[np.float_64] atom_xyz

atom_xyz[:] = xyz[3, :]

Cython shows python interaction and it slows down the code. Is there any way of avoiding the loop:

for i in range(3):
    atom_xyz[i] = xyz[3, i]

without hurting performance?

You should generally avoid using np.ndarray in cython code, rather you should use memory views. And if they are temporary arrays use an allocated array.

Also, I guess you mean:

atom_xyz[:] = xyz[ia, :]

no?

zerothi avatar Feb 23 '22 07:02 zerothi

also, try and use cython --annotate that will give you a html page with some indications on where you can optimize ;)

zerothi avatar Feb 23 '22 09:02 zerothi

Yes, I was already doing that! I got that there was python interaction in this part, but I didn't know that with a memoryview you could do this. I've changed it now and all good!

pfebrer avatar Feb 23 '22 09:02 pfebrer

Ok, everything is working fine and as fast as in fortran now for the find_neighbours method :)

newplot - 2022-02-23T104204 560

I just have to do the same for the rest of methods and that would be it. This afternoon I'll work on it and the PR will be ready to discuss the details and merge! Any particular suggestion?

pfebrer avatar Feb 23 '22 09:02 pfebrer

Great job! I'll probably take over and clean things up, also so I know what's happening, I want to learn this one ;)

zerothi avatar Feb 23 '22 09:02 zerothi

Just let me know when I should take over! :0

zerothi avatar Feb 23 '22 09:02 zerothi

Ok, you can look at it now, everything is implemented :+1:

pfebrer avatar Feb 23 '22 17:02 pfebrer

Ok some comments:

  1. when you overwrite nsc in the geometry, you'll also change the supercell indices it returns, this will break in some cases, it needs to be fixed in another way
  2. Why do you only need to look vertically and horizontally for neighbouring bins? What about the diagonals? I.e. an atom at the bin corner could connect to an atom along the diagonal? So effectively one should look in D**3 bins, no? Perhaps your tests does not catch these corner cases?
  3. Would it make sense to allow lower dimensionality searches, e.g. projection all atoms onto a plane and search only in this plane? It seems trivially generalized.

I am still working on it, so don't push anything ;)

zerothi avatar Feb 25 '22 10:02 zerothi

  1. Can you make me remember which part of the code do you mean? I can't recall. I don't return converted atomic indices and the supercell indices as I use them and return them have three components (x,y,z), so I don't understand how this could get affected (?)

  2. I do check the diagonals. For each atom/position I create a cube of 8 bins with potential neighbors (in get_search_indices). Could you draw a sketch of which bins I'm missing? I can't see it.

  3. Off the top of my head I don't know if this is easily generalizable, because depending on the direction of the plane, the created grid of bins would be useless. So perhaps it would be easier to create a new projected geometry and run the algorithm with that one. Which cases do you have in mind?

pfebrer avatar Feb 25 '22 10:02 pfebrer

  1. Ok, that would explain it, I'll have a deeper look.
  2. See here https://www.mdpi.com/1424-8220/21/24/8241?type=check_update&version=1 in that picture you are only taking the red blocks, however, a neighbouring atom could be in the blue or green boxes?
  3. I have in mind 2D materials with huge vacuum spaces, or perhaps the internal algorithm could cut the non-periodic directions to max-min of coordinates along those directions.

zerothi avatar Feb 25 '22 10:02 zerothi

  1. No ok, this is the difference of what I implemented vs what I had seen others do. The bins here are (at least) double the size of the maximum distance to look for. Then, for each given point/atom I not only find which bin is it in, but also which side of the bin is it in. This allows me to look only to one side along each direction. Which means it's only 8 bins (2**3) that I have to search through, instead of 27 (3**3) as in the case where you have to look to both sides.

  2. Yes, I had already thought about this! Although the vacuum has to be really huge for it to be a problem. One more bin means just one more item in the heads list and one more item in the counts list, so it is not really that important in terms of memory. Also dismissing an empty bin during the search happens really fast. Not filling the whole cell with bins would make you implement special cases for when you look for neighbors of a point that is in vacuum. All in all I think it's not worth it(?)

pfebrer avatar Feb 25 '22 11:02 pfebrer

  1. Great, clever ;)
  2. Well, consider a system of 500Ang^3 with R =1.42, thats going to occupy 41 MB for a single of these arrays. Not so important for DFT sized systems, but for TB?

zerothi avatar Feb 25 '22 11:02 zerothi

How many of those bins would be useless though? I imagine you would not have a 500 Ang vacuum 😅

For 2D systems you could specify the max number of bins on each direction (and you would specify 1 in the vacuum one). However this would not be as useful for slabs. I think we could keep it as is (allowing the max_bins parameter) and then if someone has memory problems we can change the implementation. However even for that huge system 41MB seems fine to handle, doesn't it?

pfebrer avatar Feb 25 '22 12:02 pfebrer