mbuild icon indicating copy to clipboard operation
mbuild copied to clipboard

Slicing and splitting of Compounds

Open summeraz opened this issue 7 years ago • 4 comments

It may be useful to have some routines to easily split or slice Compounds into smaller Compounds. While subcompounds are easy to obtain for Compounds built in a hierarchical manner, large Compounds (i.e. full molecular systems) loaded from a file won't feature this hierarchy. Here's some examples of the functionality I'm thinking could be useful:

If loading in a box of methane molecules, it would be useful to be able to break the Compound down into subcompounds (i.e. creating another level in the hierarchy) where each subcompound represents a single methane. I'm thinking the API would look something like this:

# Load a methane box containing 100 methane molecules
methane_box = mb.load('box-of-methane.mol2')
methane_box.split(100, name='Methane')

where methane_box.children would now return a list of 100 subcompounds.

Similarly, if I wanted to create a subcompound for just the first methane molecule, some functionality like this could be useful:

# Load a methane box containing 100 methane molecules
methane_box = mb.load('box-of-methane.mol2')
first_methane = methane_box[:5]

summeraz avatar May 16 '17 18:05 summeraz

Would this depend on the input file having different residue/chain numbers for each molecule?

Maybe the bondgraph be useful in defining separate 'molecules' as subcompounds? I'm thinking of a default call that would split the compound up into each subcompound that's not connected to another subcompound by any bonds.

mattwthompson avatar May 16 '17 18:05 mattwthompson

That could be useful for a smarter way of splitting the Compound. I had just been thinking of the simple case where you could break apart a Compound into equal chunks. One issue I see with splitting the Compound by residue/chain numbers/names is that Compounds don't have an attribute to store this information, so even if this information is included in the file used to load in the system, it would be lost when converting to a Compound.

Using the bond graph could be useful. There could be a use_connected argument, or something of that nature, in the split function which would look at the bond graph and create subcompounds for each of the non-connected components.

summeraz avatar May 16 '17 19:05 summeraz

Along with this, being able to add Compounds together using a simple + operator would be useful (in the same way this is possible for Parmed Structures).

summeraz avatar May 16 '17 19:05 summeraz

Finding molecules based on the bond graph is pretty straightforward so we can definitely do that.

The main limitation here comes from which metadata is and is not included in the file that you initially load. @summeraz's first snippet, for example, would only be feasible if each methane already has its own residue or something to that effect called 'Methanol' in the initial file and that info gets transferred into the compound names.

ctk3b avatar May 16 '17 19:05 ctk3b