haddock3
haddock3 copied to clipboard
Add OpenMM module
The code presented in this commit was entirely done by Ivar de Leeuw during his Master course: Major Research Project (GSLS), academic year 2021-2022.
I am pushing the code after Ivar's request and in its exact original form.
In the next commits, the team will work on Ivar's code to accommodate it fully to the HADDOCK3 project.
Thanks, Ivar, for all your great work and best luck with your future studies!
Made a general linting around files. Addition lint can still be done on function and variable names.
Hi @joaomcteixeira, just had a quick chat with Ivar De Leeuw about the version of the dependencies he used: it's openmm v. 7.7.0 and pdbfixer 1.8.1. I'll modify the INSTALL.md accordingly
Okay. I strongly suggest having them defined as optional dependencies, same as lightdock and gdock.
Okay. I strongly suggest having them defined as optional dependencies, same as lightdock and gdock.
I totally agree.
tried a run with the example file openmm-test.cfg and commit ID #c79f47a. It's already a great work. A few "style" improvements from my user experience:
- [x] we should clarify if the simulation uses implicit or explicit solvent at the beginning
- [x] we should report the difference between equilibration and the actual simulation
- [x] the logging during the actual simulation should be more frequent to allow the user to track the progress, we should inspect a bit the core openmm functions
- [x] we could split the function runOpenMM in two or more pieces for enhancing the readability
- [x] a warning appeared
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.probably we can safely ignore it, but I still have to check - [x] the integrator seed should be added as a user-defined parameter and printed during the execution, so that we could obtain (almost) identical calculations
a warning appeared Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead. probably we can safely ignore it, but I still have to check
I would say this is not ours and is something from openmm because we don't import from simtk. not in the original Ivar's code nor in the linted one.
@mgiulini, the pre-last commit solves the globals() problem. It was not necessary because the value was the same as the key, so we can simply avoid calling the globals() dictionary.
Also corrected some defaults that were lists to single strings
As for the reproducibility: it is possible to assign a seed to the integrators, but not to the modeller functions. Therefore, the solvation boxes specified at the beginning of the simulations are bound to differ.
First prettified, working version of the openmm module.
Still some things to fix/implement:
- [x] check for GPU support. GPU support should be easily available with the hpc mode and the appropriate queue, but it's currently impossible to test this because
libhpcaccepts only CNS jobs. The hard dependency on theenvvarsvariables should be somehow removed/smoothed. FIXED: openmm automatically detects the GPU in the local mode, if the GPU is available and correctly configured (check it withpython -m openmm.testInstallation) - [x] capability of running on complexes (for ex. coming from a topoaa ensemble) for refinement. DONE: achieved handling input PDBFiles in a better way (analogous to caprieval). TODO: large(r) scale benchmarking for assessing the usefulness of the openmm refinement
- [ ] tests: how to implement them? TODO: they will be addressed in a separate issue
- [ ] the check for the correct installation must be improved and it should be executed only if the openmm module is part of the chosen workflow
- [x] the --restart option does not work: the error says
IsADirectoryError: [Errno 21] Is a directory: 'run1-test/3_openmm/md_raw_output' - [ ] currently the
boxsizeis fixed by default at 10 nanometers. This can be customized (good) but the default is wrong, cause a 10x10x10 box can be either too small (periodic effects affect the simulation) or too big (waste of computational resources). We should write a function that authomatically determines this parameter when not specified by the user. Also, it would be nice to see if it's possible to use the rombic dodecahedron to save computational time. TODO: address this in a separate issue. - [x] we must do some parameter check on the number of intermediates to save. I'd say that 100 should be the maximum number.
Hi @mgiulini I have addressed what we discussed on slack; tests are now passing. Can you test the workflows to see if they work as expected?
We can have the tests for the openmm module itself in a separate pull request.
Codecov Report
Attention: 336 lines in your changes are missing coverage. Please review.
Comparison is base (
0996f0c) 74.42% compared to head (fb42264) 71.67%. Report is 324 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #482 +/- ##
==========================================
- Coverage 74.42% 71.67% -2.75%
==========================================
Files 111 113 +2
Lines 7644 8004 +360
==========================================
+ Hits 5689 5737 +48
- Misses 1955 2267 +312
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
okay @mgiulini , reverted the parameters to their original name and considered them as strings because considering them as _fname would force us to add some exception in the prepare_run. We will see in the future if we need to consider them as paths for any reason. But surely not. For example, in the future, in the case the user can provide the path to a specific forcefield file, we can have it as a separate parameter (custom_forcefield_fname, for example).
Cheers
All examples run. :+1:
@joaomcteixeira I see that the topoaa module, if it's not the first module in the .cfg file, creates the topology only for the first input model. I remember we somehow discussed this in the chat with Ivar. In the context of openmm we should generate the topology for each input model, what do you think?
try to run the example file openmm-topoaa-flexref-test.cfg, you will see that only a complex gets to the second topoaa and then to flexref
Yes, I remember that. The part which controls that is here:
https://github.com/haddocking/haddock3/blob/c404a4ab309e2f7eac181dfc0226e4e6d499bf3c/src/haddock/modules/topology/topoaa/init.py#L118-L128
And the culprit is that [0] index. We can make [0] a parameter or a guesser that behaves differently according to the situation that is presented to topoaa. In case we chose it t be a parameter the user needs to know what he/she's doing and know the parameter exists (which is not bad in the end). A guesser can be defined following the module preceding topoaa, but that would imply that the behaviour is constant for each module.
Yes, I remember that. The part which controls that is here:
https://github.com/haddocking/haddock3/blob/c404a4ab309e2f7eac181dfc0226e4e6d499bf3c/src/haddock/modules/topology/topoaa/init.py#L118-L128
And the culprit is that
[0]index. We can make[0]a parameter or a guesser that behaves differently according to the situation that is presented totopoaa. In case we chose it t be a parameter the user needs to know what he/she's doing and know the parameter exists (which is not bad in the end). A guesser can be defined following the module precedingtopoaa, but that would imply that the behaviour is constant for each module.
Yes, I found that [0] too. The question I have here is that: isn't it better to completely remove this prescription? so far we know that topoaa should not be executed if not at the beginning of a workflow, except for the openmm module (and the cg module in the future). If the users choose to execute topoaa somewhere in the workflow, I would just recreate all the topologies..what do you think? Currently the execution of topoaa after rigidbody just leads to the creation of a single topology and, consequently, the next modules are bound to be executed on a single structure.
For example, currently a workflow with topoaa-rigidbody-topoaa-flexref outputs a single refined model...and that's not correct.
It makes sense what you are saying. Maybe the [0] was something made just as a draft. I think it is fine to remove the [0] and update the topoaa operations when performed in the middle of the workflow. I think you can do that directly here in this PR.
It makes sense what you are saying. Maybe the
[0]was something made just as a draft. I think it is fine to remove the[0]and update thetopoaaoperations when performed in the middle of the workflow. I think you can do that directly here in this PR.
will do, thanks for the feedback! I think it's a good improvement, as it allows to reduce the complexity of the overall logic. maybe I will create a separate PR, as this is not strictly related to openmm and could be relevant in other scenarios.
UPDATE: link issue #497
Perfect. I agree with using a new PR! :+1:
hi all! I would say that the first alpha version of the openmm module is ready for review. Although I changed the code substantially, the overall structure is the same as the one created by Ivar. There are still some major improvements to be made, such as
- the addition of GPU support
- the creation of a test suite
- a more resources-friendly creation of the solvation box
Still, I would say that these can be addressed separately in other issues.
Aren’t the solvation boxes built from a pre-equilibrated water box?

This is so beautiful!
Aren’t the solvation boxes built from a pre-equilibrated water box?
Hi Alex, the solvation box in openmm is built by default using the TIP3P solvent model, which should be already equilibrated. But the solute-solvent interface mixture is not equilibrated at all, as the water molecules are simply removed in correspondence of the solute.
That’s standard procedure.
The first equilibration step should be to minimise the solvent with the solute position restrainted, and then heat up the system slowly releasing the solute so that the solvent adapts to the solute (and not vice-versa)
That’s standard procedure. The first equilibration step should be to minimise the solvent with the solute position restrainted, and then heat up the system slowly releasing the solute so that the solvent adapts to the solute (and not vice-versa)
Precisely! Following Ivar's code, I didn't include the full procedure in the current implementation to save resources. The system is built, energy-minimized, and then quickly "equilibrated" at the simulation temperature (300 K) for a few steps. Should I include the full equilibration protocol?
Precisely! Following Ivar's code, I didn't include the full procedure in the current implementation to save resources. The system is built, energy-minimized, and then quickly "equilibrated" at the simulation temperature (300 K) for a few steps. Should I include the full equilibration protocol?
I guess this should be checked at least. In principle starting directly at 300K is fine as long as there are positions restraints on the solute that are slowly removed.
Precisely! Following Ivar's code, I didn't include the full procedure in the current implementation to save resources. The system is built, energy-minimized, and then quickly "equilibrated" at the simulation temperature (300 K) for a few steps. Should I include the full equilibration protocol? I guess this should be checked at least. In principle starting directly at 300K is fine as long as there are positions restraints on the solute that are slowly removed.
I will investigate the code more thoroughly then :)
resolve #776
What will happen if a protein-ligand complex is passed to the openmm module? Is there a check that only "pure" protein and nuclei acids are accepted? Also I assume (did not check the openmm code) that the system is neutralised by adding counter ions. Is this true?
- System is neutralized
- No yet checks for ligands... must do it, good point !
- Added try/except logic to handle cases where topology cannot be generated due to unknown (by the force-field) residues.
- Also fixed module when using implicit solvent !