SolvationToolkit icon indicating copy to clipboard operation
SolvationToolkit copied to clipboard

Support for additional molecule formats

Open zakimolvi opened this issue 8 years ago • 6 comments

It would be helpful to be able to specify mol2 and pdb molecule files as inputs for the Component class

zakimolvi avatar Aug 24 '17 19:08 zakimolvi

@zakimolvi - can you explain the use case you have in mind? Are you thinking of molecules which are too big to be represented as SMILES?

davidlmobley avatar Aug 24 '17 21:08 davidlmobley

@davidlmobley Yes, molecules which are too complex to be easily represented as SMILES.

zakimolvi avatar Aug 25 '17 02:08 zakimolvi

Thanks, @zakimolvi . Any thoughts/preferences on what the API for this would look like? Maybe something like:

binary_mixture = MixtureSystem()
binary_mixture.addComponent(name='water', mole_fraction=0.2)
# add a filling compound - assumed to be rest of mixture if no
# mole_fraction specified
binary_mixture.addComponent(name='my_really_complex_molecule', file="mymolecule.mol2")
#Build for GROMACS
binary_mixture.build(gromacs = True)

Would that suit? Since the OE tools process most common molecular file formats based on extension, I think doing this and just using OEReadMolecule to process the provided file should do the trick, assuming the provided file is (a) a single molecule/conformation, and (b) a common normal file format.

What would you want the default behavior to be? Should this use the provided conformation, or attempt to generate a conformer?

Maybe you want to provide an example of the kinds of molecules you have in mind?

Another issue is charging. Normally we're calculating AM1-BCC charges (calculated for the molecule as a whole) for everything; this will perhaps work for SOME molecules which are larger than those which can easily be represented by SMILES, but I think for most larger molecules a fragmentation scheme will need to be applied, which is way outside of scope here. So, what should we be doing? My inclination is to say that a) User-provided charges are required if reading from a file, and b) The code should refuse to execute if user provided charges aren't provided (to keep people from shooting themselves in the head by assuming it will assign charges for them even though it (would be) clearly documented that it does not)

Thoughts?

davidlmobley avatar Aug 25 '17 12:08 davidlmobley

@zakimolvi , thoughts on the above? Happy to implement, but need to make sure what I implement will meet your needs.

davidlmobley avatar Aug 28 '17 21:08 davidlmobley

@davidlmobley The example API you've provided looks great.

I am looking at a relatively simple case where the file could simply be processed using OE tools. I'd like to solvate a box of kinase inhibitors (ex. Erlotinib), in which case being able to supply a mol2 (or pdb) file would be convenient. The file does not contain a unique conformation or charge assignments that need to be preserved, so a conformer should be generated and AM1-BCC charges should be calculated. The molecules I am interested in have not had issues with calculating AM1-BCC charge assignments, but allowing user-provided charges would be a good workaround for larger molecules.

b) The code should refuse to execute if user provided charges aren't provided

Perhaps this should be the case for all mol2 files unless the user knowingly specifies an option to calculate AM1-BCC charges.

Hope that clarifies everything.

zakimolvi avatar Aug 31 '17 00:08 zakimolvi

I totally forgot about this issue. Just assigned it to myself; perhaps I can revisit it fairly soon (or if you'd like to have a shot at it, it should be a fairly simple implementation change, @zakimolvi ).

davidlmobley avatar Dec 06 '17 19:12 davidlmobley