hoomd-blue
hoomd-blue copied to clipboard
Built-in support for neighborlists in `hoomd.md.force.Custom`
Description
The hoomd.md.pair.Pair
base class seems to take care of a lot of administrative details (none of which I am familiar with) that seem to be necessary for using a neighborlist in a potential (https://github.com/glotzerlab/hoomd-blue/blob/master/hoomd/md/pair/pair.py#L230).
It would be helpful if hoomd.md.force.Custom
, or a subclass, also could take care of this internal administration to allow an easy way to make custom potentials that use neighbor information.
(Context: I am curious about how hard it would or wouldn't be to bridge a machine-learning potential into HOOMD-blue; building the input to such a potential model requires a neighborlist and calling out to a poorly optimized 3rd party neighborlist in the custom force seems like a very bad solution.)
To input a machine learned potential right now would require a self-generated neighbor list. This is not due to lack of logic in hoomd.md.force.Custom
but rather the hoomd.md.nlist.NList
subclasses do not expose their neighbors. Currently the best solution would be to use a library like https://freud.readthedocs.io/en/latest/ to build the neighbor list and build it large enough so you don't have to recreate it every time-step.
Most of the details Pair
manages center around the pair potential params
structure and other options like energy shifting modes. The behind the scenes management with neighbor lists is to forward the matrix of r_cut values to the NeighborList data structure: https://hoomd-blue.readthedocs.io/en/latest/module-md-nlist.html . We don't currently expose that to Python through the custom force, but could add a CustomPair
force that does.
The main barrier to implementing pairwise interactions with force.Custom
is that NeighborList currently doesn't offer the internal neighbor list to the user. The internal format used in HOOMD is convoluted to make the computation of the neighbor list and the evaluation of pair forces both fast on the GPU. It isn't in a state that can be readily passed to machine learning frameworks.
The main reason we haven't exposed the neighbor list yet is we haven't decided on what the right solution is. We could expose the internal representation as a zero-copy operation and require users to convert that to whatever format they need. This puts a burden on the user, but gives maximal flexibility. The main risk is that the internal representation is now part of the API, so any tweaks we make to optimize the neighbor list will require a major version bump. The alternative is to decide on a simplified user-facing representation and return that. There would be some processing time involved to change representations. Also, whatever representation we choose, the user would still likely need to transform it as different use-cases have different needs.
Some ideas for simplified representations:
- Pair list: i.e. [[0, 10], [500, 20], [80, 12], [500, 230], ...] - a N_pairs by 2 array of pairs of rank-local particle indices. This data structure suitable for use with numpy array indexing and loops over pairs.
- Padded matrix: i.e. [[10, 50, 4, 500, -1, -1], [18, 670, -1, -1, -1, -1], ...] - a N_particles by max(n_neighbors) array listing the neighbors of particle i in each row with -1 sentinel values referring to no neighbor. Note, depending on the use-case, a transposed version of this array with neighbors down columns may be more efficient.
The internal format is similar to the padded matrix, but with variable length rows and a second array that lists the number of neighbors for each particle.
What type of representation would you prefer?
I see, thanks for the quick response! Sounds like something like this is a ways off.
For us (and I believe most ML potentials that I am aware of) the pair list is much more helpful. In fact what you list is just the transpose of our native format of [2, n_pair]
(this is more efficient for taking out the center and neighbor index arrays in row major ordering).
This functionality does not need to be a ways off. I would estimate this would take ~200 lines of code (including documentation). Anyone could implement it now and submit a pull request. We are here to answer any questions.
I would suggest 2 pull requests to be review the pieces separately: One that adds a pair_list
property to NeighborList
and a second that adds a CustomPair
force class.
I'm currently making an implementation of this for my own use cases. Before I make a pull request, I was just wondering if maybe pair_list
should be made a property of CustomPair
instead of NeighborList
. Since CustomPair
will need to handle passing the r_cut matrix to NeighborList
anyway, maybe it's better to make all access happen through that class?
That, or the other route I'd suggest is extending NeighborList
so it can be instantiated and used without having to be attached to a force compute (by having the ability to handle it's own r_cut matrix)? The neighborlist acceleration is useful for interactions beyond pair forces, and the name CustomPair
may be better reserved for isotropic pair forces where a user provides just the force magnitude and energy (like how the C++ pair evaulator classes are implemented, with the possibility to JIT'ing the python func).
That, or the other route I'd suggest is extending
NeighborList
so it can be instantiated and used without having to be attached to a force compute (by having the ability to handle it's own r_cut matrix)?
Just FYI: the C++ code for the NeighborList
already does this
Sorry, I must have missed that 😅. But I don't see anything in the Python API that's able to interact with the r_cuts?
The methods for setting rcut may not be exposed at the Python level. Since multiple objects can attach onto the same NeighborList
, the NeighborList
uses a subscriber mechanism to poll them and take their max:
https://github.com/glotzerlab/hoomd-blue/blob/6a508701a09f0471723c8ed2c8392f52a0eecf35/hoomd/md/NeighborList.cc#L430-L503
pair_list
should be a member of NeighborList
. It owns and computes the data, so it should exist there. This will allow users to access the pairs used by a NeighborList
for pair forces if need be for debugging, analysis, or for reuse with multiple consumers.
As Michael points out there is a consumer model for determining the maximum r_cut
matrix. We should add an API that sets the base r_cut
so that NeighborList
could be used by itself. This would replace the memset(h_r_cut.data, 0, sizeof(Scalar) * m_r_cut.getNumElements());
line. One other thing we would need to check on is whether NeighborList
can be added to the Operations.computes
list so it can be attached separately from a Force
or other consumer class. If not, we need to make that possible.
Also, thanks for taking the time to work on this! It has been an often requested feature but no one else has had the time.
Yes, so I was suggesting that the NeighborList
be able to handle it's own r_cut matrix (by subscribing to itself or with a code path that checks that if standalone r_cuts are enabled) and providing accessor methods to modify the matrix. It might not be super elegant given that the NeighborList
API expects to work with a consumer class, and it's odd to have a code path where the NeighborList
acts as its own consumer, but it would really simplify working with nlists from the Python side.
... and it's odd to have a code path where the
NeighborList
acts as it's own consumer.
Yes, let's not make NeighborList
be its own consumer. The circular references are avoidable, but still needlessly complex. Add another matrix (i.e. m_r_cut_base
) settable that forms the starting point for the max
operation that updateRList
computes.
One other thing we would need to check on is whether NeighborList can be added to the Operations.computes list so it can be attached separately from a Force or other consumer class. If not, we need to make that possible.
Ahh yea, that makes sense. It currently does not work as you'd hope, but I was just taking a look and I think that's as easy as swapping AutotunedObject
for Compute
here.
https://github.com/glotzerlab/hoomd-blue/blob/6a508701a09f0471723c8ed2c8392f52a0eecf35/hoomd/md/nlist.py#L70
Implemented in #1399.