perses
perses copied to clipboard
Add small molecule repex consistency tests
Description
Adds neutral and charge changing small molecule tests (the latter one has not been added yet).
Motivation and context
Resolves #959 Resolves #1061
How has this been tested?
Change log
@ijpulidos : I've pushed the small molecule neutral and charge tests. The charge test isn't done yet, but I was thinking you could take over from here? The charge test is currently yielding the error described here. Once we figure out how to fix that error, you may have to debug a bit more to make sure the test is running without any more errors and then check that its actually passing.
Note that I had to make some changes to RelativeFEPSetup
to make sure that the host (cb7.mol2) can be parametrized properly. These changes are necessary because otherwise RelativeFEPSetup
does not consider the host as a small molecule and therefore does not tell the SystemGenerator
that it needs to parametrize it as a small molecule.
@ijpulidos : Update: I've fixed the parametrization error in the charge test that I mentioned above
@zhang-ivy Thanks for this! I'll take a look and let you know if I encounter anything.
We need to switch to the new LangevinMiddelIntegrator once we cut a new release of openmmtools.
To speed up this test, see https://github.com/choderalab/perses/issues/1058#issuecomment-1188299488
To use the change in openmmtools
0.21.5 that uses LangevinMiddleIntegrator
we would have to change the test to use LangevinDynamicsMove
instead of LangevinSplittingDynamicsMove
, but I see that almost all tests (except one) use the latter. How do we want to approach this? Just changing the move in this test would be enough to test what we want to test?
How do we want to approach this? Just changing the move in this test would be enough to test what we want to test?
From discussions with @zhang-ivy , I think we want to change all of the tests to use LangevinDynamicsMove
which in turn would use the LangevinMiddleIntegrator
changes in latest openmmtools.
Codecov Report
Merging #1065 (7b16afc) into main (6730065) will decrease coverage by
0.32%
. The diff coverage is3.90%
.
As discussed in our dev sync, I saw the following results
According to log output it would be taking more than 4 hours for completion, which seems like too much compared to what @zhang-ivy has obtained in other tests.
The script to generate the data is attached. I basically just copied the test_RESTCapableHybridTopologyFactory_repex_charge_transformation
and made it a script. To run you can use python test_repex_charge.py 2> stderr_out.txt
for example, this would write the stderr output to the specified file. You can generate the plots (assuming you have two log files you want to compare), using the attached make_plots.py
barebones script (modify as needed).
We should also try to run this using the nightly versions of Openmm to see further performance boosts.
~While debugging the performance of this test a solution to issue #1085 was required, I updated the description of this PR to reflect this.~ This was reverted, check discussion in https://github.com/choderalab/perses/pull/1065#discussion_r950170592
Notes from discussion with Iván today:
- We are seeing that small molecule (CCC->CCCC) neutral test fails -- i.e. there is a large discrepancy between forward and reverse transformations.
n_iterations: 1000, DDG (2.0556889259223965), 6 * dDDG (1.8660036814494445)
n_iterations: 2000, DDG (1.9424681339279566), 6 * dDDG (1.504043152162713)
n_iterations: 3000, DDG (2.3012075072431486), 6 * dDDG (1.3496196586095943)
n_iterations: 4000, DDG (2.105869914487293), 6 * dDDG (1.1557338251054274)
n_iterations: 5000, DDG (2.0091207197835246), 6 * dDDG (0.9972790553988209)
n_iterations: 6000, DDG (2.1447482384645653), 6 * dDDG (0.9375579903514)
n_iterations: 7000, DDG (1.8431263594498581), 6 * dDDG (0.8343179445910711)
n_iterations: 8000, DDG (1.9820627460818656), 6 * dDDG (0.7713827653749263)
n_iterations: 9000, DDG (1.9790420115873495), 6 * dDDG (0.758837900156935)
n_iterations: 10000, DDG (2.32585423139), 6 * dDDG (0.720859505508521)
- We checked whether the energy validation tests are passing. They are failing, but only by ~0.2 kT. When the dispersion correction is off, the test passes. I think its ok to proceed despite the small discrepancy. I investigated this in detail and wrote up notes here #1098
- We also checked the yank analysis notebook for these transformations, and found that the statistical inefficiency is super large:
Any idea why the statistical inefficiency is so large for this transformation?
When we look at the same plot for a tyk2 ligand solvent phase transformation, we do not see such a large statistical inefficiency:
I wonder if we should refrain from using CCC->CCCC and instead use a tyk2 ligand solvent phase transformation instead?
We should choose a transformation with fewer rotatable bonds in one state but still has a large free energy difference, like paracetamol to phenol
@zhang-ivy Run the neutral transformation test with phenol to paracetamol, using the Vanilla HTF and the results for the DDG and dDDG are as follow:
n_iterations: 1000, DDG (4.379856483835027), 6 * dDDG (6.427969308929397)
n_iterations: 2000, DDG (5.104396507573654), 6 * dDDG (5.103675724302308)
n_iterations: 3000, DDG (5.05622797342096), 6 * dDDG (4.150472778157009)
n_iterations: 4000, DDG (5.661439924925766), 6 * dDDG (3.9788164082347923)
n_iterations: 5000, DDG (4.924333536929069), 6 * dDDG (3.4960583717742804)
n_iterations: 6000, DDG (5.675488856850542), 6 * dDDG (3.3866050177001448)
n_iterations: 7000, DDG (5.928197475123568), 6 * dDDG (3.018527704140613)
n_iterations: 8000, DDG (5.141353236248172), 6 * dDDG (2.786667888092715)
n_iterations: 9000, DDG (5.0989841273643535), 6 * dDDG (2.5409050042675574)
n_iterations: 10000, DDG (5.524543006009203), 6 * dDDG (2.5493728345032727)
The equilibration plot is the following
@ijpulidos : Can you try generating the same plot with openmmtools 0.21.4? Just want to see if those results also yield a large statistical inefficiency.
@ijpulidos : Did you run phenol to paracetamol with the rest htf as well?
@ijpulidos : Did you run phenol to paracetamol with the rest htf as well?
RestHTF gives better results for the DDG vs. dDDG comparison
n_iterations: 1000, DDG (0.7389286664047461), 6 * dDDG (4.349966796166879)
n_iterations: 2000, DDG (1.4476000767035657), 6 * dDDG (3.35301957925458)
n_iterations: 3000, DDG (1.0050371228929293), 6 * dDDG (2.877359027173167)
n_iterations: 4000, DDG (0.5892713335768889), 6 * dDDG (2.4970385474617656)
n_iterations: 5000, DDG (0.08935214968687433), 6 * dDDG (2.2207918586467272)
n_iterations: 6000, DDG (0.7799885336397239), 6 * dDDG (2.0505836836904154)
n_iterations: 7000, DDG (0.17155683937092192), 6 * dDDG (1.90492611578378)
n_iterations: 8000, DDG (0.06495242783663002), 6 * dDDG (1.7822284655187324)
n_iterations: 9000, DDG (0.294242113389501), 6 * dDDG (1.692400658436612)
n_iterations: 10000, DDG (0.05825571585386058), 6 * dDDG (1.6147951072659277)
The equilibration plot for it is as follows
I found a "strange" thing happening with the mapper, which may explain the differences. Left is RestHTF, right is Vanilla HTF
I found a "strange" thing happening with the mapper, which may explain the differences. Left is RestHTF, right is Vanilla HTF
I don't think this is a bug, it's just that the debug.png
file getting generated here is not the best mapping but just whatever was last in the atom_mappings
set. We probably do want to export the best mapping instead, though.
Notes from dev sync today:
- We should try setting
n_steps
per iteration = 250 instead of 50. This will hopefully cause the subsample rate to decrease and should decrease the number of iterations needed for DDG < 6 * dDDG. - We still don't know why the test fails with the vanilla htf and not the rest htf -- need to investigate this. @ijpulidos let me know if you want to brainstorm how to do this.
I've updated this branch/PR with updated logging + using the new openmm8 rc so we can see how that does.
These are the slowest tests
5637.75s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_mutation
1613.93s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_mutation
1569.07s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_transformation
1179.05s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_transformation
Need to fix (don't need to fix in this PR):
subprocess.run("perses-relative test.yml", shell=True, check=True, env=env)
test_RESTCapableHybridTopologyFactory_repex_charge_transformation
FAILED perses/tests/test_relative.py::test_RESTCapableHybridTopologyFactory_energies_GPU - ValueError: Could not locate file "gaff.xml"
FAILED perses/tests/test_relative.py::test_unsampled_endstate_energies_GPU - ValueError: Could not locate file "gaff.xml"
FAILED perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_transformation - AssertionError: DDG (2.151708958621054) is greater than 6 * dDDG (1.1169450391489755)
FAILED perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_transformation - AssertionError: DDG (22.989485678075503) is greater than 6 * dDDG (12.23018630199661)
FAILED perses/tests/test_resume.py::test_cli_resume_repex - subprocess.CalledProcessError: Command 'perses-relative test.yml' returned non-zero exit ...
assert DDG < 6 * dDDG, f"DDG ({DDG}) is greater than 6 * dDDG ({6 * dDDG})"
E AssertionError: DDG (22.989485678075503) is greater than 6 * dDDG (12.23018630199661)
E assert 22.989485678075503 < (6 * 2.038364383666101
FAILED perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_transformation - AssertionError: DDG (1.9296819854921097) is greater than 6 * dDDG (0.8521151431457088)
FAILED perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_transformation - AssertionError: DDG (23.089451730916885) is greater than 6 * dDDG (11.498215681031873)
@ijpulidos @mikemhenry : The first thing I would do to fix these failing tests is to increase the number of steps per iteration to 250 (as we discussed earlier)
If that doesn't fix things, I would increase the number of iterations until the tests pass.
It might be best to run these locally (on lilac or your external GPU) first to make sure you can get the tests to converge with the changes I recommended
As far as I've been testing, the protein mutation tests pass with 1000 iterations and 250 steps (n_steps
).
Whereas the small molecule transformations fail even with 2000 iterations and 250 steps. The time they took are as follows (these on different GPUs, it is hard to test on same gpu due to availability):
Passing tests:
19356.48s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_mutation
1558.90s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_mutation
Failing tests:
10512.75s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_transformation
3911.30s call perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_transformation
That means we are talking about at least 10 hrs for these tests to run.
I think we already noticed the issues with the small molecule transformations tests in https://github.com/choderalab/perses/pull/1065#issuecomment-1243797927 I'll try running the tests with the transformation suggested in that comment and check the results.
~@ijpulidos : Can you try running the small molecule tests even longer (5 ns)? If they are still failing even with 5000 iterations, there may be a more significant problem here.~ Nevermind, we already know from https://github.com/choderalab/perses/pull/1065#issuecomment-1233507210 that even 10000 iterations isn't enough for the small molecule neutral test to converge.
Ah actually ignore my above comment, I think you're right that we should change the tests to use phenol to paracetamol. But not sure which transformation to use for the small molecule charge changing tests..
Tests failing are unrelated to the changes of this PR, will address these in a different PR.
I can review this later tonight or tomorrow! Please wait to merge. Thanks!
Reviewing this now. Github won't let me comment on lines that haven't been changed, but I noticed that this line: "Note: We are using 50 steps per iteration here to speed up the test. We expect larger DDGs and dDDGs as as result." is out of date and can be removed