mdanalysis
mdanalysis copied to clipboard
Investigate making Bonds use a C++ unordered map datastructure and Cython
Is your feature request related to a problem?
As far as I can tell the Bonds
datastructures and topology attributes currently use an N*M
matrix to store Bond information where N
is the total number of bonds and M
is the length of the bond type.
For example for the index array bonds
type is of shape [N
, 2].
This may make getting the bonds involved in a specific atom an inefficient process Complexity(O)? due to needing to traverse.
This could possibly be improved by using Cython on its own and also a different representation.
Describe the solution you'd like
Would a data-structure of type:
std::unordered_map<int, std::vector<int>>
where int
is the atom and the vector
contains connected atoms be more efficient?
This is similar to the idea used in topDict
which is stores bond information in
topDict[(1,2)] = [Bond(blah, blah), ....]
This is the equivalent of the C++ type
std::unordered_map<std::tuple<int,int>, std::vector<int>>
Or could we do even better by assuming a small size for the vector and using a fixed size container with a linked list tacked on. For example:
std::vector<std::array<int, 8>>
Where the 8th item in the vector indicates (if present) which vector element the row continues on : Complexity(O)?
This is motivated by the fact that there are not many atoms with 8+ bonds.
Additional context
Still trying to get to grips with the Topology system, so this may be updated as I figure out more.
Or could we do even better by assuming a small size for the vector and using a fixed size container with a linked list tacked on.
Not sure if I understand correctly what you are trying to achieve here, but wouldn't reserving a given number n
of elements for std::vector
achieve mostly the same thing, with less complicated code?
Really good point @RMeli. I guess it depends if the stack allocated memory is faster or not but does sound a lot less complicated.