haddock3 icon indicating copy to clipboard operation
haddock3 copied to clipboard

Add OpenMM module

Open joaomcteixeira opened this issue 3 years ago • 28 comments

The code presented in this commit was entirely done by Ivar de Leeuw during his Master course: Major Research Project (GSLS), academic year 2021-2022.

I am pushing the code after Ivar's request and in its exact original form.

In the next commits, the team will work on Ivar's code to accommodate it fully to the HADDOCK3 project.

Thanks, Ivar, for all your great work and best luck with your future studies!

joaomcteixeira avatar Jun 20 '22 09:06 joaomcteixeira

Made a general linting around files. Addition lint can still be done on function and variable names.

joaomcteixeira avatar Jun 20 '22 10:06 joaomcteixeira

Hi @joaomcteixeira, just had a quick chat with Ivar De Leeuw about the version of the dependencies he used: it's openmm v. 7.7.0 and pdbfixer 1.8.1. I'll modify the INSTALL.md accordingly

mgiulini avatar Jun 21 '22 08:06 mgiulini

Okay. I strongly suggest having them defined as optional dependencies, same as lightdock and gdock.

joaomcteixeira avatar Jun 21 '22 08:06 joaomcteixeira

Okay. I strongly suggest having them defined as optional dependencies, same as lightdock and gdock.

I totally agree.

mgiulini avatar Jun 21 '22 09:06 mgiulini

tried a run with the example file openmm-test.cfg and commit ID #c79f47a. It's already a great work. A few "style" improvements from my user experience:

  • [x] we should clarify if the simulation uses implicit or explicit solvent at the beginning
  • [x] we should report the difference between equilibration and the actual simulation
  • [x] the logging during the actual simulation should be more frequent to allow the user to track the progress, we should inspect a bit the core openmm functions
  • [x] we could split the function runOpenMM in two or more pieces for enhancing the readability
  • [x] a warning appeared Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead. probably we can safely ignore it, but I still have to check
  • [x] the integrator seed should be added as a user-defined parameter and printed during the execution, so that we could obtain (almost) identical calculations

mgiulini avatar Jun 21 '22 10:06 mgiulini

a warning appeared Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead. probably we can safely ignore it, but I still have to check

I would say this is not ours and is something from openmm because we don't import from simtk. not in the original Ivar's code nor in the linted one.

joaomcteixeira avatar Jun 21 '22 11:06 joaomcteixeira

@mgiulini, the pre-last commit solves the globals() problem. It was not necessary because the value was the same as the key, so we can simply avoid calling the globals() dictionary.

Also corrected some defaults that were lists to single strings

joaomcteixeira avatar Jun 21 '22 12:06 joaomcteixeira

As for the reproducibility: it is possible to assign a seed to the integrators, but not to the modeller functions. Therefore, the solvation boxes specified at the beginning of the simulations are bound to differ.

mgiulini avatar Jul 06 '22 15:07 mgiulini

First prettified, working version of the openmm module.

Still some things to fix/implement:

  • [x] check for GPU support. GPU support should be easily available with the hpc mode and the appropriate queue, but it's currently impossible to test this because libhpc accepts only CNS jobs. The hard dependency on the envvars variables should be somehow removed/smoothed. FIXED: openmm automatically detects the GPU in the local mode, if the GPU is available and correctly configured (check it with python -m openmm.testInstallation)
  • [x] capability of running on complexes (for ex. coming from a topoaa ensemble) for refinement. DONE: achieved handling input PDBFiles in a better way (analogous to caprieval). TODO: large(r) scale benchmarking for assessing the usefulness of the openmm refinement
  • [ ] tests: how to implement them? TODO: they will be addressed in a separate issue
  • [ ] the check for the correct installation must be improved and it should be executed only if the openmm module is part of the chosen workflow
  • [x] the --restart option does not work: the error says IsADirectoryError: [Errno 21] Is a directory: 'run1-test/3_openmm/md_raw_output'
  • [ ] currently the boxsize is fixed by default at 10 nanometers. This can be customized (good) but the default is wrong, cause a 10x10x10 box can be either too small (periodic effects affect the simulation) or too big (waste of computational resources). We should write a function that authomatically determines this parameter when not specified by the user. Also, it would be nice to see if it's possible to use the rombic dodecahedron to save computational time. TODO: address this in a separate issue.
  • [x] we must do some parameter check on the number of intermediates to save. I'd say that 100 should be the maximum number.

mgiulini avatar Jul 07 '22 15:07 mgiulini

Hi @mgiulini I have addressed what we discussed on slack; tests are now passing. Can you test the workflows to see if they work as expected?

We can have the tests for the openmm module itself in a separate pull request.

joaomcteixeira avatar Jul 12 '22 12:07 joaomcteixeira

Codecov Report

Attention: 336 lines in your changes are missing coverage. Please review.

Comparison is base (0996f0c) 74.42% compared to head (fb42264) 71.67%. Report is 324 commits behind head on main.

Files Patch % Lines
src/haddock/modules/refinement/openmm/openmm.py 1.00% 296 Missing :warning:
src/haddock/modules/refinement/openmm/__init__.py 34.48% 38 Missing :warning:
src/haddock/libs/libsubprocess.py 33.33% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #482      +/-   ##
==========================================
- Coverage   74.42%   71.67%   -2.75%     
==========================================
  Files         111      113       +2     
  Lines        7644     8004     +360     
==========================================
+ Hits         5689     5737      +48     
- Misses       1955     2267     +312     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jul 12 '22 15:07 codecov-commenter

okay @mgiulini , reverted the parameters to their original name and considered them as strings because considering them as _fname would force us to add some exception in the prepare_run. We will see in the future if we need to consider them as paths for any reason. But surely not. For example, in the future, in the case the user can provide the path to a specific forcefield file, we can have it as a separate parameter (custom_forcefield_fname, for example). Cheers

joaomcteixeira avatar Jul 12 '22 16:07 joaomcteixeira

All examples run. :+1:

joaomcteixeira avatar Jul 13 '22 10:07 joaomcteixeira

@joaomcteixeira I see that the topoaa module, if it's not the first module in the .cfg file, creates the topology only for the first input model. I remember we somehow discussed this in the chat with Ivar. In the context of openmm we should generate the topology for each input model, what do you think?

try to run the example file openmm-topoaa-flexref-test.cfg, you will see that only a complex gets to the second topoaa and then to flexref

mgiulini avatar Jul 15 '22 14:07 mgiulini

Yes, I remember that. The part which controls that is here:

https://github.com/haddocking/haddock3/blob/c404a4ab309e2f7eac181dfc0226e4e6d499bf3c/src/haddock/modules/topology/topoaa/init.py#L118-L128

And the culprit is that [0] index. We can make [0] a parameter or a guesser that behaves differently according to the situation that is presented to topoaa. In case we chose it t be a parameter the user needs to know what he/she's doing and know the parameter exists (which is not bad in the end). A guesser can be defined following the module preceding topoaa, but that would imply that the behaviour is constant for each module.

joaomcteixeira avatar Jul 16 '22 15:07 joaomcteixeira

Yes, I remember that. The part which controls that is here:

https://github.com/haddocking/haddock3/blob/c404a4ab309e2f7eac181dfc0226e4e6d499bf3c/src/haddock/modules/topology/topoaa/init.py#L118-L128

And the culprit is that [0] index. We can make [0] a parameter or a guesser that behaves differently according to the situation that is presented to topoaa. In case we chose it t be a parameter the user needs to know what he/she's doing and know the parameter exists (which is not bad in the end). A guesser can be defined following the module preceding topoaa, but that would imply that the behaviour is constant for each module.

Yes, I found that [0] too. The question I have here is that: isn't it better to completely remove this prescription? so far we know that topoaa should not be executed if not at the beginning of a workflow, except for the openmm module (and the cg module in the future). If the users choose to execute topoaa somewhere in the workflow, I would just recreate all the topologies..what do you think? Currently the execution of topoaa after rigidbody just leads to the creation of a single topology and, consequently, the next modules are bound to be executed on a single structure.

For example, currently a workflow with topoaa-rigidbody-topoaa-flexref outputs a single refined model...and that's not correct.

mgiulini avatar Jul 18 '22 07:07 mgiulini

It makes sense what you are saying. Maybe the [0] was something made just as a draft. I think it is fine to remove the [0] and update the topoaa operations when performed in the middle of the workflow. I think you can do that directly here in this PR.

joaomcteixeira avatar Jul 18 '22 08:07 joaomcteixeira

It makes sense what you are saying. Maybe the [0] was something made just as a draft. I think it is fine to remove the [0] and update the topoaa operations when performed in the middle of the workflow. I think you can do that directly here in this PR.

will do, thanks for the feedback! I think it's a good improvement, as it allows to reduce the complexity of the overall logic. maybe I will create a separate PR, as this is not strictly related to openmm and could be relevant in other scenarios.

UPDATE: link issue #497

mgiulini avatar Jul 18 '22 08:07 mgiulini

Perfect. I agree with using a new PR! :+1:

joaomcteixeira avatar Jul 18 '22 09:07 joaomcteixeira

hi all! I would say that the first alpha version of the openmm module is ready for review. Although I changed the code substantially, the overall structure is the same as the one created by Ivar. There are still some major improvements to be made, such as

  1. the addition of GPU support
  2. the creation of a test suite
  3. a more resources-friendly creation of the solvation box

Still, I would say that these can be addressed separately in other issues.

mgiulini avatar Aug 31 '22 12:08 mgiulini

Aren’t the solvation boxes built from a pre-equilibrated water box?

amjjbonvin avatar Oct 11 '22 07:10 amjjbonvin

image

This is so beautiful!

joaomcteixeira avatar Oct 11 '22 07:10 joaomcteixeira

Aren’t the solvation boxes built from a pre-equilibrated water box?

Hi Alex, the solvation box in openmm is built by default using the TIP3P solvent model, which should be already equilibrated. But the solute-solvent interface mixture is not equilibrated at all, as the water molecules are simply removed in correspondence of the solute.

mgiulini avatar Oct 11 '22 07:10 mgiulini

That’s standard procedure.

The first equilibration step should be to minimise the solvent with the solute position restrainted, and then heat up the system slowly releasing the solute so that the solvent adapts to the solute (and not vice-versa)

amjjbonvin avatar Oct 11 '22 07:10 amjjbonvin

That’s standard procedure. The first equilibration step should be to minimise the solvent with the solute position restrainted, and then heat up the system slowly releasing the solute so that the solvent adapts to the solute (and not vice-versa)

Precisely! Following Ivar's code, I didn't include the full procedure in the current implementation to save resources. The system is built, energy-minimized, and then quickly "equilibrated" at the simulation temperature (300 K) for a few steps. Should I include the full equilibration protocol?

mgiulini avatar Oct 11 '22 08:10 mgiulini

Precisely! Following Ivar's code, I didn't include the full procedure in the current implementation to save resources. The system is built, energy-minimized, and then quickly "equilibrated" at the simulation temperature (300 K) for a few steps. Should I include the full equilibration protocol?

I guess this should be checked at least. In principle starting directly at 300K is fine as long as there are positions restraints on the solute that are slowly removed.

amjjbonvin avatar Oct 11 '22 09:10 amjjbonvin

Precisely! Following Ivar's code, I didn't include the full procedure in the current implementation to save resources. The system is built, energy-minimized, and then quickly "equilibrated" at the simulation temperature (300 K) for a few steps. Should I include the full equilibration protocol? I guess this should be checked at least. In principle starting directly at 300K is fine as long as there are positions restraints on the solute that are slowly removed.

I will investigate the code more thoroughly then :)

mgiulini avatar Oct 11 '22 09:10 mgiulini

resolve #776

VGPReys avatar Jan 05 '24 11:01 VGPReys

What will happen if a protein-ligand complex is passed to the openmm module? Is there a check that only "pure" protein and nuclei acids are accepted? Also I assume (did not check the openmm code) that the system is neutralised by adding counter ions. Is this true?

  • System is neutralized
  • No yet checks for ligands... must do it, good point !

VGPReys avatar Oct 01 '24 08:10 VGPReys

  • Added try/except logic to handle cases where topology cannot be generated due to unknown (by the force-field) residues.
  • Also fixed module when using implicit solvent !

VGPReys avatar Oct 02 '24 13:10 VGPReys