sisl
sisl copied to clipboard
Reuse orbital products in density calculation for multiple DMs
If one wants to calculate the LDOS(E)
from a set of density matrices for different energies, one needs to call dm.density()
for each energy, which results in the computation of the orbital products for each energy. This makes it very difficult to compute the energy resolved LDOS in practice.
It would be nice if the orbital products could be calculated once and then used for all density matrices. The two naive approaches would be:
- Save orbital products and pass them to
density
calls. Seems unfeasible because of the memory required, since the shape of the products would be(grid_x, grid_y, grid_z, orb_pair)
, being orb_pairn_orbitals ** 2 / 2
(I think). Although perhaps they could be stored in a sparse manner and then it would be doable (?). - Have a
density
function that accepts multiple DMs and computes all the densities at the same time. If you need to calculate the density for the whole grid, this would also require a lot of memory:(grid_x, grid_y, grid_z, nE)
.
In my particular case, I want to average all grids on X and Y, ending with a final array of (grid_z, nE)
, so perhaps there is some way of avoiding the massive memory requirements?
As a note, calculating STS spectra is nice to compare calculations with experiments. These spectra are basically LDOS(x, E)
, being "x" some position in a path of interest. Currently, there is no way to project real space quantities into a path or even a generic collection of points in sisl
. One has to go through a grid, which makes it very computation and memory demanding. Wouldn't it be nice to have something like this? If so I can open a separate issue. I just brought it up here because if you only need to compute the LDOS along a path then both options would be reasonably cheap.
There are 2 other methods I think are more appropriate.
- Move things to Cython
- The loop overhead is immense in what we do (for large systems)
- The data-structures are numpy arrays, so moving them to memoryviews should provide quite a speedup
- The interpolation method for the orbitals could be moved to Cython interfaces, for much faster access.
- Siesta does it more or less like this, it loops over all grid-points sequentially and checks for orbital overlaps.
- Create a data-structure that stores
grid_idx, grid_values
for each orbital. In this way the calculation of the orbital values would be removed and one could re-use these. It should be much faster and then one only needs to do orbital products which should be a smaller part.- The sparse data-structures here are much smaller than what you suggest
- It has the overhead of still doing the orb-products, but I think having all necessary orb-products is too much memory overhead (it will simply blow up)
- This is also what Siesta does, it stores the orbital grid values, but simply uses them for the multiplications.
Either way, both could be required, the Cython interface could be used to calculate the grid-values in an appropriate data-structure, which then is used in the subsequent calculation.
- Yeah I already thought of this but it was a bit scary since I saw method calls inside the loops.
- Yes, that's a good compromise I think!
I will try to do the implementation, because I want to understand all the details of projecting from orbital space to real space, more specifically how to treat the periodic boundaries which is not yet clear to me :)
By the way, I have thought of an approach that I think would be cheaper. But I'm not sure if there would be problems with precision. Probably you have already thought about this so there must be some kind of drawback.
One could calculate the orbital values for each unique orbital (as with the orbital.toGrid()
method) and store them. Then, calculating the orbital value for any orbital would just be a matter of shifting the reference grid and interpolating to the actual grid points. One would save the hussle of checking boundaries, calculating spherical coordinates and calculating phi for all orbitals.
If you have fine enough reference orbital grids (say 2x finer than the grid where you are projecting), this would be precise enough, no? Do you have a sense for how much time would this save, or whether it would save time at all?
Also if you do this you could cut some extra time by profitting from the simmetries of the spherical harmonics when computing the reference grids (toGrid
), no? You only need to compute 1/8th of the grid, and the rest you can infer it by simmetry, which at most requires a change of sign so it would be much cheaper than complex calculations. I think this optimization is not happening in the current toGrid
method, am I right?
By the way, I have thought of an approach that I think would be cheaper. But I'm not sure if there would be problems with precision. Probably you have already thought about this so there must be some kind of drawback.
One could calculate the orbital values for each unique orbital (as with the
orbital.toGrid()
method) and store them. Then, calculating the orbital value for any orbital would just be a matter of shifting the reference grid and interpolating to the actual grid points. One would save the hussle of checking boundaries, calculating spherical coordinates and calculating phi for all orbitals.
I don't think this would be worth the hazzle
-
toGrid
uses a square grid, so you are storing ~ 50% zeros - the calculation culprit is also the interpolation. I have not timed exactly where time is spent (radial interpolation or spherical harmonics), but my feeling is that you won't same much
- You still have to calculate the overlap between two orbitals + account for any skewness the cell has (ok, your interpolation function should do the skewness operation, but)
- I don't see how you would save the checking boundaries? I.e. moving the grid to the position does not automatically take into account the overlap?
If you have fine enough reference orbital grids (say 2x finer than the grid where you are projecting), this would be precise enough, no? Do you have a sense for how much time would this save, or whether it would save time at all?
My feeling is that it is not a deal breaker. The only benefit of this would be that you could precalculate it with a very fine grid, then re-use it for multiple different grid precisions, whereas the direct storage for a specific grid would not be transferable. My main worry is the 50% zeros, which to circumvent would require indices anyway ;)
Also if you do this you could cut some extra time by profitting from the simmetries of the spherical harmonics when computing the reference grids (
toGrid
), no? You only need to compute 1/8th of the grid, and the rest you can infer it by simmetry, which at most requires a change of sign so it would be much cheaper than complex calculations. I think this optimization is not happening in the currenttoGrid
method, am I right?
Indeed, the symmetry consideration could be ideal. This could (also!) easily be done in the storage capability, i.e. have an array of indices (for the grid indices), and an array of indices of the orbital function, and then only orbital values on the symmetry reduced space. Note that for s orbitals you could do with less than 1/8 (I think). However, with symmetry you should still figure out where to put minus signs for the symmetry copy.
Hmm... I don't know, but my feeling is that it would be better to stick with the grids you are using.
(1) The worry about 50% zeros is because of memory? Do you think that would be a problem? Regarding computation time, doing 2x multiplications (when calculating orbital products) seems like a good compromise if you avoid calculating other things.
(2) Hmm ok, so the radial part is not calculated every time but interpolated? The spherical harmonics are indeed calculated every time, right?
(4) Regarding the overlap, yes you would still need to calculate it I guess.
(5) Regarding simmetry, as I see it you can only easily profit from it if each orbital has its own grid. Then you can put the orbital center exactly at the center of the grid (not being the center itself a point of the grid, and having an even number of points in all directions). Applying simmetries is then extremely easy. I don't see how you would apply the simmetries of all orbitals in a generic grid, because to be perfectly precise you would also need to interpolate, no?
(1) The worry about 50% zeros is because of memory? Do you think that would be a problem? Regarding computation time, doing 2x multiplications (when calculating orbital products) seems like a good compromise if you avoid calculating other things.
Yes, because of memory. It all depends on whether the user was careful to copy orbitals, or referencing them.
(2) Hmm ok, so the radial part is not calculated every time but interpolated? The spherical harmonics are indeed calculated every time, right?
Yes, and I can't recall if they are heavy to calculate, or not... :(
Probably higher l
are... But I don't know the performance of the underlying routine (scipy.special.lpmv
)
(4) Regarding the overlap, yes you would still need to calculate it I guess.
(5) Regarding simmetry, as I see it you can only easily profit from it if each orbital has its own grid. Then you can put the orbital center exactly at the center of the grid (not being the center itself a point of the grid, and having an even number of points in all directions). Applying simmetries is then extremely easy. I don't see how you would apply the simmetries of all orbitals in a generic grid, because to be perfectly precise you would also need to interpolate, no?
Yes, you are right about the precision, but you are already trying to circumvent some of these things by introducing interpolations. For dense enough grids, it shouldn't matter, but for coarse grids it will be a problem...
Ok, for the moment I'm just trying to separate the calculation of the orbital values on the grid from the products as you proposed. Afterwards I will worry about optimization. I think I mostly understand everything, but I have some doubts:
- These lines don't have any effect, do they? https://github.com/zerothi/sisl/blob/7b043eddbe93f5b12138fb53fe8f48c4a484049c/sisl/physics/densitymatrix.py#L417-L423
- Here, when you calculate the indices of the grid that are within the maximum radius of the atom:
https://github.com/zerothi/sisl/blob/7b043eddbe93f5b12138fb53fe8f48c4a484049c/sisl/physics/densitymatrix.py#L640-L660
Why do you move points from outside the grid to the borders of the grid? Why not just delete/filter them? I am confused because the way you increase the supercell by
maxR
on all directions, you can select some atoms whose influence does not actually reach the grid (atoms can be as far assqrt(maxR**2 + maxR**2 + maxR**2)
to the closest grid point). Then all indices are outside the grid and you end up with a border point being selected, which does not actually make sense.
- No they don't.
- I tested quite a bit of how to do this, and this seemed like the fasted way
I have tried many things in this routine to see how Python could be as fast as possible, it wasn't easy. For 1, I can't recall why, probably there was something I thought about, but I can't remember. As for 2, the problem is that you want to remove the points, and doing:
idx = idx[idx[:, 0] >= 0, 0]
...
proved much slower (depending on the grid-sampling). I can't recall if I tried:
filter = np.logical_and(0 <= idx[:, 0], idx[:, 0] < shape[0])
filter = np.logical_and(filter, 0 <= idx[:, 1])
...
but you are welcome to try out and see if it is faster, proper tests are really required since the complexity makes it hard to know which calculation is the most optimal.
As for having a failed point. It should only amount to a single point that if is in, should be so far off that the values are 0 anyways. But, I get your drift, it really shouldn't be there. It was a performance wise work-around.
I have already a first working implementation of precalculating orbital values in python + computing the product with cython. In terms of computation time, the implementation seems to be between 1 and 2 orders of magnitude faster than the current one. However not all the gain comes from the different approach and cython, I have found some parts of the current code that are slowing the current implementation unnecessarily (I think). I will submit the PR and discuss what I have found later today.