BioSimSpace icon indicating copy to clipboard operation
BioSimSpace copied to clipboard

Add parameters to merge region of interest

Open kexul opened this issue 2 years ago • 11 comments

Hi, this PR adds parameters for the merge function, allowing users select regions of interest to merge, which might be useful for modeling protein mutation and covalent binding.

A unit test for the new merge function, along with test input files was updated. The input files were derived from this paper, R346K mutation of SARS-Cov2, the result should be around 1 kcal/mol when these input files were used (0.69 kcal/mol reported in the paper).

Currently, only Gromacs file format is supported when writing the merged molecule.

My colleague tested it retrospectively in an in-house antibody design project, the average Pearson R is 0.6 for simple mutations (no net charge change, no bond breaking), which seems to be promising. The covalent binding experiment is in progress, and the preliminary results look good.

kexul avatar Jul 14 '22 13:07 kexul

Many thanks for this. I'll try to take a closer look as soon as I get a chance.

lohedges avatar Jul 14 '22 13:07 lohedges

Just to note, your PR seems related to this previous issue/question. Here we performed a merge for a mutation by modifying the mapping, rather than the merge function. Essentially, you first create a mapping for all atoms other than the region of interest, then add those that were mapped from within the region. I wonder if this would be an easier approach in your case too, or perhaps there are some issues that I've not considered. One problem we ran into was that the merge could take a long time for a large protein, but I think this might not be so much of an issue now due to some performance enhancements behind the scenes.

lohedges avatar Jul 14 '22 13:07 lohedges

One problem we ran into was that the merge could take a long time for a large protein, but I think this might not be so much of an issue now due to some performance enhancements behind the scenes.

Yes, the main problem I've run into is the speed. It could be wonderful if the performance could be enhanced. 😻 Another problem I've run into is the naming of residues, the whole protein would turn into one residue called 'LIG'. The free energy result remains unaffected, but VMD and pymol do not recognize it as a protein anymore.

kexul avatar Jul 14 '22 13:07 kexul

Ah, yes, good point. The existing merge assumes a single residue and renames to LIG for consistency with what SOMD expects (although this isn't really needed). It makes sense that the merge function can also handle multi-residue molecules. (I'm not sure if simulations of this nature would work with SOMD at present.)

lohedges avatar Jul 14 '22 14:07 lohedges

Thanks again for this, it will be a really useful addtion. I have a few comments questions:

  • Is there a need for both mut_idx and roi? At present it seems like they are completely coupled and the code would only work in the situation where the protein is identical, other that a single mutated residue at the same index in both molecules. If so, I'd prefer an API where the use can simply pass the mut_idx to all of the appropriate functions, i.e. matchAtoms and merge. In the first case, the mapping that is returned could be computed in a similar fashion to as is done in your test, with checks to make sure that the two molecules are identical other than the mutated residue. When passing the mut_idx to merge, the roi could automatically be determined from the atom indices of the matching residue in both molecules.

  • Looking at the modified merge code, it is only used when computing the modified intrascale properties at each end state. Comparing this section of the code to devel it appears to have been modified in a way that is inconsistent with the original, i.e. you don't only set terms when they are non-zero, which would overwrite values. (The order in which the matrix is updated is important for the precedence of the terms.) You also extract values from the existing intrascale matrices (as is done in the original code), yet the values aren't used anywhere, e.g. here. Since the single-point energy tests pass it looks like all is okay, at least for the molecules tested. I'll try to run some checks against a range of different molecules to make sure that things still work. (I recall it taking a while to make sure that the terms in the modified intrascale matrix were correct.) I tried updating the code locally to follow the old approach. While your unit test still passes, for some reason the existing test_merge fails. (I'll try to figure out why.)

Cheers.

lohedges avatar Jul 19 '22 13:07 lohedges

I've now fixed my local copy so that the old version of setting terms in the intrascale matrix works for both test_merge and test_merge_roi. (There was some inconsistency in the names for mapped indices between your code and the original.) I'll run some more checks to make sure that both approaches give the same terms in all cases.

lohedges avatar Jul 19 '22 14:07 lohedges

Hi @lohedges , thanks for the feedback!

I'd prefer an API where the use can simply pass the mut_idx to all of the appropriate functions

That sounds good! I was considering a complete protein mutation solution (inserting and deleting of the amino acid involved) when I first wrote the code, in which case atom indexes should be suitable for general use. But in later practice, I found that residue information was needed to restore a protein, so mut_idx was added.

Comparing this section of the code to devel it appears to have been modified in a way that is inconsistent with the original, i.e. you don't only set terms when they are non-zero, which would overwrite values. (The order in which the matrix is updated is important for the precedence of the terms.)

Sorry for that, I copied the code from my current working branch, in which I've implemented the soft bond potential to modeling scaffold hopping. This section of the code was modified to handle the case where the bond connection type was changed during merging.

You also extract values from the existing intrascale matrices (as is done in the original code), yet the values aren't used anywhere, e.g. here.

Sorry, that seems to be redundant code I forget to clean. 🙈

kexul avatar Jul 19 '22 17:07 kexul

I was considering a complete protein mutation solution (inserting and deleting of the amino acid involved) when I first wrote the code, in which case atom indexes should be suitable for general use. But in later practice, I found that residue information was needed to restore a protein, so mut_idx was added.

No problem, I often have the same issue. It's only when you start working on things that you figure out exactly what information is needed and nail down the API. I tend to go back and strip things back to the bare minimum once it's working. We can always add back extra options for more flexibility in future.

Sorry for that, I copied the code from my current working branch, in which I've implemented the soft bond potential to modeling scaffold hopping. This section of the code was modified to handle the case where the bond connection type was changed during merging.

No worries, the unit tests pass so it still seems to work. I was just confused when I first looked at it. I can switch it back to the current version once merged. Apologies for the lack of progress on your soft-bond feature. @jmichel80 has been super busy and I don't have the technical (theoretical) expertise to look at it. (At least without devoting a bit of time to it, anyway.)

One other thing that I forgot to mention was in relation to this:

Currently, only Gromacs file format is supported when writing the merged molecule.

When I get a moment I'll see how the SOMD pert file writer deals with mutations. I think it should work, or at least only require minor modifications. If not, we should add checks to the FEP setup (both in FreeEnergy.Relative and Process.Somd) to raise an exception when a multi-residue perturbable molecule is detected.

Cheers.

lohedges avatar Jul 20 '22 09:07 lohedges

Thanks for your thorough inspection! It's nice to see the API getting cleaner.

Apologies for the lack of progress on your soft-bond feature.

No problem, feel free to leave comments whenever suits you.

When I get a moment I'll see how the SOMD pert file writer deals with mutations.

Great! Looking forward to seeing it in the future.

kexul avatar Jul 20 '22 11:07 kexul

Hi there,

Just to say that it looks like this feature has been picked up by Exscientia here. From the commits, it seems like a few of my suggestions have been incorporated, it has been generalised to multiple mutations, a few bugs fixed, etc. I'll wait to see if there's an updated PR related to that work before merging.

Cheers.

lohedges avatar Sep 22 '22 12:09 lohedges

Wow! Great to hear that! 🥂

kexul avatar Sep 22 '22 13:09 kexul