BioSimSpace
BioSimSpace copied to clipboard
Add parameters to merge region of interest
Hi, this PR adds parameters for the merge function, allowing users select regions of interest to merge, which might be useful for modeling protein mutation and covalent binding.
A unit test for the new merge function, along with test input files was updated. The input files were derived from this paper, R346K mutation of SARS-Cov2, the result should be around 1 kcal/mol when these input files were used (0.69 kcal/mol reported in the paper).
Currently, only Gromacs file format is supported when writing the merged molecule.
My colleague tested it retrospectively in an in-house antibody design project, the average Pearson R is 0.6 for simple mutations (no net charge change, no bond breaking), which seems to be promising. The covalent binding experiment is in progress, and the preliminary results look good.
Many thanks for this. I'll try to take a closer look as soon as I get a chance.
Just to note, your PR seems related to this previous issue/question. Here we performed a merge for a mutation by modifying the mapping, rather than the merge function. Essentially, you first create a mapping for all atoms other than the region of interest, then add those that were mapped from within the region. I wonder if this would be an easier approach in your case too, or perhaps there are some issues that I've not considered. One problem we ran into was that the merge could take a long time for a large protein, but I think this might not be so much of an issue now due to some performance enhancements behind the scenes.
One problem we ran into was that the merge could take a long time for a large protein, but I think this might not be so much of an issue now due to some performance enhancements behind the scenes.
Yes, the main problem I've run into is the speed. It could be wonderful if the performance could be enhanced. 😻 Another problem I've run into is the naming of residues, the whole protein would turn into one residue called 'LIG'. The free energy result remains unaffected, but VMD and pymol do not recognize it as a protein anymore.
Ah, yes, good point. The existing merge assumes a single residue and renames to LIG
for consistency with what SOMD expects (although this isn't really needed). It makes sense that the merge function can also handle multi-residue molecules. (I'm not sure if simulations of this nature would work with SOMD at present.)
Thanks again for this, it will be a really useful addtion. I have a few comments questions:
-
Is there a need for both
mut_idx
androi
? At present it seems like they are completely coupled and the code would only work in the situation where the protein is identical, other that a single mutated residue at the same index in both molecules. If so, I'd prefer an API where the use can simply pass themut_idx
to all of the appropriate functions, i.e.matchAtoms
andmerge
. In the first case, the mapping that is returned could be computed in a similar fashion to as is done in your test, with checks to make sure that the two molecules are identical other than the mutated residue. When passing themut_idx
tomerge
, theroi
could automatically be determined from the atom indices of the matching residue in both molecules. -
Looking at the modified merge code, it is only used when computing the modified
intrascale
properties at each end state. Comparing this section of the code todevel
it appears to have been modified in a way that is inconsistent with the original, i.e. you don't only set terms when they are non-zero, which would overwrite values. (The order in which the matrix is updated is important for the precedence of the terms.) You also extract values from the existingintrascale
matrices (as is done in the original code), yet the values aren't used anywhere, e.g. here. Since the single-point energy tests pass it looks like all is okay, at least for the molecules tested. I'll try to run some checks against a range of different molecules to make sure that things still work. (I recall it taking a while to make sure that the terms in the modified intrascale matrix were correct.) I tried updating the code locally to follow the old approach. While your unit test still passes, for some reason the existingtest_merge
fails. (I'll try to figure out why.)
Cheers.
I've now fixed my local copy so that the old version of setting terms in the intrascale matrix works for both test_merge
and test_merge_roi
. (There was some inconsistency in the names for mapped indices between your code and the original.) I'll run some more checks to make sure that both approaches give the same terms in all cases.
Hi @lohedges , thanks for the feedback!
I'd prefer an API where the use can simply pass the mut_idx to all of the appropriate functions
That sounds good!
I was considering a complete protein mutation solution (inserting and deleting of the amino acid involved) when I first wrote the code, in which case atom indexes should be suitable for general use. But in later practice, I found that residue information was needed to restore a protein, so mut_idx
was added.
Comparing this section of the code to devel it appears to have been modified in a way that is inconsistent with the original, i.e. you don't only set terms when they are non-zero, which would overwrite values. (The order in which the matrix is updated is important for the precedence of the terms.)
Sorry for that, I copied the code from my current working branch, in which I've implemented the soft bond potential to modeling scaffold hopping. This section of the code was modified to handle the case where the bond connection type was changed during merging.
You also extract values from the existing intrascale matrices (as is done in the original code), yet the values aren't used anywhere, e.g. here.
Sorry, that seems to be redundant code I forget to clean. 🙈
I was considering a complete protein mutation solution (inserting and deleting of the amino acid involved) when I first wrote the code, in which case atom indexes should be suitable for general use. But in later practice, I found that residue information was needed to restore a protein, so mut_idx was added.
No problem, I often have the same issue. It's only when you start working on things that you figure out exactly what information is needed and nail down the API. I tend to go back and strip things back to the bare minimum once it's working. We can always add back extra options for more flexibility in future.
Sorry for that, I copied the code from my current working branch, in which I've implemented the soft bond potential to modeling scaffold hopping. This section of the code was modified to handle the case where the bond connection type was changed during merging.
No worries, the unit tests pass so it still seems to work. I was just confused when I first looked at it. I can switch it back to the current version once merged. Apologies for the lack of progress on your soft-bond feature. @jmichel80 has been super busy and I don't have the technical (theoretical) expertise to look at it. (At least without devoting a bit of time to it, anyway.)
One other thing that I forgot to mention was in relation to this:
Currently, only Gromacs file format is supported when writing the merged molecule.
When I get a moment I'll see how the SOMD pert file writer deals with mutations. I think it should work, or at least only require minor modifications. If not, we should add checks to the FEP setup (both in FreeEnergy.Relative
and Process.Somd
) to raise an exception when a multi-residue perturbable molecule is detected.
Cheers.
Thanks for your thorough inspection! It's nice to see the API getting cleaner.
Apologies for the lack of progress on your soft-bond feature.
No problem, feel free to leave comments whenever suits you.
When I get a moment I'll see how the SOMD pert file writer deals with mutations.
Great! Looking forward to seeing it in the future.
Hi there,
Just to say that it looks like this feature has been picked up by Exscientia here. From the commits, it seems like a few of my suggestions have been incorporated, it has been generalised to multiple mutations, a few bugs fixed, etc. I'll wait to see if there's an updated PR related to that work before merging.
Cheers.
Wow! Great to hear that! 🥂