mbuild
mbuild copied to clipboard
Slicing and splitting of Compounds
It may be useful to have some routines to easily split or slice Compounds into smaller Compounds. While subcompounds are easy to obtain for Compounds built in a hierarchical manner, large Compounds (i.e. full molecular systems) loaded from a file won't feature this hierarchy. Here's some examples of the functionality I'm thinking could be useful:
If loading in a box of methane molecules, it would be useful to be able to break the Compound down into subcompounds (i.e. creating another level in the hierarchy) where each subcompound represents a single methane. I'm thinking the API would look something like this:
# Load a methane box containing 100 methane molecules
methane_box = mb.load('box-of-methane.mol2')
methane_box.split(100, name='Methane')
where methane_box.children
would now return a list of 100 subcompounds.
Similarly, if I wanted to create a subcompound for just the first methane molecule, some functionality like this could be useful:
# Load a methane box containing 100 methane molecules
methane_box = mb.load('box-of-methane.mol2')
first_methane = methane_box[:5]
Would this depend on the input file having different residue/chain numbers for each molecule?
Maybe the bondgraph be useful in defining separate 'molecules' as subcompounds? I'm thinking of a default call that would split the compound up into each subcompound that's not connected to another subcompound by any bonds.
That could be useful for a smarter way of splitting the Compound. I had just been thinking of the simple case where you could break apart a Compound into equal chunks. One issue I see with splitting the Compound by residue/chain numbers/names is that Compounds don't have an attribute to store this information, so even if this information is included in the file used to load in the system, it would be lost when converting to a Compound.
Using the bond graph could be useful. There could be a use_connected
argument, or something of that nature, in the split
function which would look at the bond graph and create subcompounds for each of the non-connected components.
Along with this, being able to add Compounds together using a simple +
operator would be useful (in the same way this is possible for Parmed Structures).
Finding molecules based on the bond graph is pretty straightforward so we can definitely do that.
The main limitation here comes from which metadata is and is not included in the file that you initially load. @summeraz's first snippet, for example, would only be feasible if each methane already has its own residue or something to that effect called 'Methanol' in the initial file and that info gets transferred into the compound names.