mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Investigate making Bonds use a C++ unordered map datastructure and Cython

Open hmacdope opened this issue 2 years ago • 2 comments

Is your feature request related to a problem?

As far as I can tell the Bonds datastructures and topology attributes currently use an N*M matrix to store Bond information where N is the total number of bonds and M is the length of the bond type.

For example for the index array bonds type is of shape [N, 2].

This may make getting the bonds involved in a specific atom an inefficient process Complexity(O)? due to needing to traverse.

This could possibly be improved by using Cython on its own and also a different representation.

Describe the solution you'd like

Would a data-structure of type:

std::unordered_map<int, std::vector<int>>

where int is the atom and the vector contains connected atoms be more efficient?

This is similar to the idea used in topDict which is stores bond information in

topDict[(1,2)] =  [Bond(blah, blah), ....]

This is the equivalent of the C++ type

std::unordered_map<std::tuple<int,int>, std::vector<int>>

Or could we do even better by assuming a small size for the vector and using a fixed size container with a linked list tacked on. For example:

std::vector<std::array<int, 8>>

Where the 8th item in the vector indicates (if present) which vector element the row continues on : Complexity(O)?

This is motivated by the fact that there are not many atoms with 8+ bonds.

Additional context

Still trying to get to grips with the Topology system, so this may be updated as I figure out more.

hmacdope avatar Jun 24 '22 07:06 hmacdope

Or could we do even better by assuming a small size for the vector and using a fixed size container with a linked list tacked on.

Not sure if I understand correctly what you are trying to achieve here, but wouldn't reserving a given number n of elements for std::vector achieve mostly the same thing, with less complicated code?

RMeli avatar Jun 24 '22 09:06 RMeli

Really good point @RMeli. I guess it depends if the stack allocated memory is faster or not but does sound a lot less complicated.

hmacdope avatar Jun 24 '22 22:06 hmacdope