perses Add small molecule repex consistency tests

Description

Adds neutral and charge changing small molecule tests (the latter one has not been added yet).

Motivation and context

Resolves #959 Resolves #1061

How has this been tested?

Change log

Jun 30 '22 22:06 zhang-ivy

@ijpulidos : I've pushed the small molecule neutral and charge tests. The charge test isn't done yet, but I was thinking you could take over from here? The charge test is currently yielding the error described here. Once we figure out how to fix that error, you may have to debug a bit more to make sure the test is running without any more errors and then check that its actually passing.

Note that I had to make some changes to RelativeFEPSetup to make sure that the host (cb7.mol2) can be parametrized properly. These changes are necessary because otherwise RelativeFEPSetup does not consider the host as a small molecule and therefore does not tell the SystemGenerator that it needs to parametrize it as a small molecule.

Jul 02 '22 17:07 zhang-ivy

@ijpulidos : Update: I've fixed the parametrization error in the charge test that I mentioned above

Jul 06 '22 13:07 zhang-ivy

@zhang-ivy Thanks for this! I'll take a look and let you know if I encounter anything.

Jul 06 '22 14:07 ijpulidos

We need to switch to the new LangevinMiddelIntegrator once we cut a new release of openmmtools.

Jul 18 '22 20:07 ijpulidos

To speed up this test, see https://github.com/choderalab/perses/issues/1058#issuecomment-1188299488

Jul 18 '22 20:07 zhang-ivy

To use the change in openmmtools 0.21.5 that uses LangevinMiddleIntegrator we would have to change the test to use LangevinDynamicsMove instead of LangevinSplittingDynamicsMove, but I see that almost all tests (except one) use the latter. How do we want to approach this? Just changing the move in this test would be enough to test what we want to test?

Aug 15 '22 16:08 ijpulidos

How do we want to approach this? Just changing the move in this test would be enough to test what we want to test?

From discussions with @zhang-ivy , I think we want to change all of the tests to use LangevinDynamicsMove which in turn would use the LangevinMiddleIntegrator changes in latest openmmtools.

Aug 15 '22 17:08 ijpulidos

Codecov Report

Merging #1065 (7b16afc) into main (6730065) will decrease coverage by 0.32%. The diff coverage is 3.90%.

Aug 15 '22 17:08 codecov[bot]

As discussed in our dev sync, I saw the following results

comparison_iteration_times

According to log output it would be taking more than 4 hours for completion, which seems like too much compared to what @zhang-ivy has obtained in other tests.

The script to generate the data is attached. I basically just copied the test_RESTCapableHybridTopologyFactory_repex_charge_transformation and made it a script. To run you can use python test_repex_charge.py 2> stderr_out.txt for example, this would write the stderr output to the specified file. You can generate the plots (assuming you have two log files you want to compare), using the attached make_plots.py barebones script (modify as needed).

We should also try to run this using the nightly versions of Openmm to see further performance boosts.

test_repex_charge.tar.gz

Aug 15 '22 21:08 ijpulidos

~While debugging the performance of this test a solution to issue #1085 was required, I updated the description of this PR to reflect this.~ This was reverted, check discussion in https://github.com/choderalab/perses/pull/1065#discussion_r950170592

Aug 19 '22 02:08 ijpulidos

Notes from discussion with Iván today:

We are seeing that small molecule (CCC->CCCC) neutral test fails -- i.e. there is a large discrepancy between forward and reverse transformations.

n_iterations: 1000, DDG (2.0556889259223965), 6 * dDDG (1.8660036814494445)
n_iterations: 2000, DDG (1.9424681339279566), 6 * dDDG (1.504043152162713)
n_iterations: 3000, DDG (2.3012075072431486), 6 * dDDG (1.3496196586095943)
n_iterations: 4000, DDG (2.105869914487293), 6 * dDDG (1.1557338251054274)
n_iterations: 5000, DDG (2.0091207197835246), 6 * dDDG (0.9972790553988209)
n_iterations: 6000, DDG (2.1447482384645653), 6 * dDDG (0.9375579903514)
n_iterations: 7000, DDG (1.8431263594498581), 6 * dDDG (0.8343179445910711)
n_iterations: 8000, DDG (1.9820627460818656), 6 * dDDG (0.7713827653749263)
n_iterations: 9000, DDG (1.9790420115873495), 6 * dDDG (0.758837900156935)
n_iterations: 10000, DDG (2.32585423139), 6 * dDDG (0.720859505508521)

We checked whether the energy validation tests are passing. They are failing, but only by ~0.2 kT. When the dispersion correction is off, the test passes. I think its ok to proceed despite the small discrepancy. I investigated this in detail and wrote up notes here #1098
We also checked the yank analysis notebook for these transformations, and found that the statistical inefficiency is super large:

Any idea why the statistical inefficiency is so large for this transformation?

When we look at the same plot for a tyk2 ligand solvent phase transformation, we do not see such a large statistical inefficiency: image (14)

I wonder if we should refrain from using CCC->CCCC and instead use a tyk2 ligand solvent phase transformation instead?

Aug 31 '22 22:08 zhang-ivy

We should choose a transformation with fewer rotatable bonds in one state but still has a large free energy difference, like paracetamol to phenol

Sep 06 '22 19:09 zhang-ivy

@zhang-ivy Run the neutral transformation test with phenol to paracetamol, using the Vanilla HTF and the results for the DDG and dDDG are as follow:

n_iterations: 1000, DDG (4.379856483835027), 6 * dDDG (6.427969308929397)
n_iterations: 2000, DDG (5.104396507573654), 6 * dDDG (5.103675724302308)
n_iterations: 3000, DDG (5.05622797342096), 6 * dDDG (4.150472778157009)
n_iterations: 4000, DDG (5.661439924925766), 6 * dDDG (3.9788164082347923)
n_iterations: 5000, DDG (4.924333536929069), 6 * dDDG (3.4960583717742804)
n_iterations: 6000, DDG (5.675488856850542), 6 * dDDG (3.3866050177001448)
n_iterations: 7000, DDG (5.928197475123568), 6 * dDDG (3.018527704140613)
n_iterations: 8000, DDG (5.141353236248172), 6 * dDDG (2.786667888092715)
n_iterations: 9000, DDG (5.0989841273643535), 6 * dDDG (2.5409050042675574)
n_iterations: 10000, DDG (5.524543006009203), 6 * dDDG (2.5493728345032727)

The equilibration plot is the following

Sep 12 '22 14:09 ijpulidos

@ijpulidos : Can you try generating the same plot with openmmtools 0.21.4? Just want to see if those results also yield a large statistical inefficiency.

Sep 12 '22 14:09 zhang-ivy

@ijpulidos : Did you run phenol to paracetamol with the rest htf as well?

Sep 12 '22 14:09 zhang-ivy

@ijpulidos : Did you run phenol to paracetamol with the rest htf as well?

RestHTF gives better results for the DDG vs. dDDG comparison

n_iterations: 1000, DDG (0.7389286664047461), 6 * dDDG (4.349966796166879)
n_iterations: 2000, DDG (1.4476000767035657), 6 * dDDG (3.35301957925458)
n_iterations: 3000, DDG (1.0050371228929293), 6 * dDDG (2.877359027173167)
n_iterations: 4000, DDG (0.5892713335768889), 6 * dDDG (2.4970385474617656)
n_iterations: 5000, DDG (0.08935214968687433), 6 * dDDG (2.2207918586467272)
n_iterations: 6000, DDG (0.7799885336397239), 6 * dDDG (2.0505836836904154)
n_iterations: 7000, DDG (0.17155683937092192), 6 * dDDG (1.90492611578378)
n_iterations: 8000, DDG (0.06495242783663002), 6 * dDDG (1.7822284655187324)
n_iterations: 9000, DDG (0.294242113389501), 6 * dDDG (1.692400658436612)
n_iterations: 10000, DDG (0.05825571585386058), 6 * dDDG (1.6147951072659277)

The equilibration plot for it is as follows

I found a "strange" thing happening with the mapper, which may explain the differences. Left is RestHTF, right is Vanilla HTF

Sep 12 '22 16:09 ijpulidos

I found a "strange" thing happening with the mapper, which may explain the differences. Left is RestHTF, right is Vanilla HTF

I don't think this is a bug, it's just that the debug.png file getting generated here is not the best mapping but just whatever was last in the atom_mappings set. We probably do want to export the best mapping instead, though.

Sep 12 '22 19:09 ijpulidos

Notes from dev sync today:

We should try setting n_steps per iteration = 250 instead of 50. This will hopefully cause the subsample rate to decrease and should decrease the number of iterations needed for DDG < 6 * dDDG.
We still don't know why the test fails with the vanilla htf and not the rest htf -- need to investigate this. @ijpulidos let me know if you want to brainstorm how to do this.

Sep 12 '22 20:09 zhang-ivy

I've updated this branch/PR with updated logging + using the new openmm8 rc so we can see how that does.

Jan 26 '23 16:01 mikemhenry

These are the slowest tests

5637.75s call     perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_mutation
1613.93s call     perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_mutation
1569.07s call     perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_transformation
1179.05s call     perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_transformation

Need to fix (don't need to fix in this PR): subprocess.run("perses-relative test.yml", shell=True, check=True, env=env)

test_RESTCapableHybridTopologyFactory_repex_charge_transformation

FAILED perses/tests/test_relative.py::test_RESTCapableHybridTopologyFactory_energies_GPU - ValueError: Could not locate file "gaff.xml"
FAILED perses/tests/test_relative.py::test_unsampled_endstate_energies_GPU - ValueError: Could not locate file "gaff.xml"
FAILED perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_transformation - AssertionError: DDG (2.151708958621054) is greater than 6 * dDDG (1.1169450391489755)
FAILED perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_transformation - AssertionError: DDG (22.989485678075503) is greater than 6 * dDDG (12.23018630199661)
FAILED perses/tests/test_resume.py::test_cli_resume_repex - subprocess.CalledProcessError: Command 'perses-relative test.yml' returned non-zero exit ...

          assert DDG < 6 * dDDG, f"DDG ({DDG}) is greater than 6 * dDDG ({6 * dDDG})"
E           AssertionError: DDG (22.989485678075503) is greater than 6 * dDDG (12.23018630199661)
E           assert 22.989485678075503 < (6 * 2.038364383666101

Jan 26 '23 16:01 mikemhenry

FAILED perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_transformation - AssertionError: DDG (1.9296819854921097) is greater than 6 * dDDG (0.8521151431457088)
FAILED perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_transformation - AssertionError: DDG (23.089451730916885) is greater than 6 * dDDG (11.498215681031873)

Jan 26 '23 20:01 mikemhenry

@ijpulidos @mikemhenry : The first thing I would do to fix these failing tests is to increase the number of steps per iteration to 250 (as we discussed earlier)

If that doesn't fix things, I would increase the number of iterations until the tests pass.

Jan 31 '23 16:01 zhang-ivy

It might be best to run these locally (on lilac or your external GPU) first to make sure you can get the tests to converge with the changes I recommended

Jan 31 '23 16:01 zhang-ivy

As far as I've been testing, the protein mutation tests pass with 1000 iterations and 250 steps (n_steps). Whereas the small molecule transformations fail even with 2000 iterations and 250 steps. The time they took are as follows (these on different GPUs, it is hard to test on same gpu due to availability):

Passing tests:

19356.48s call     perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_mutation
1558.90s call     perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_mutation

Failing tests:

10512.75s call     perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_charge_transformation
3911.30s call     perses/tests/test_repex.py::test_RESTCapableHybridTopologyFactory_repex_neutral_transformation

That means we are talking about at least 10 hrs for these tests to run.

Feb 06 '23 17:02 ijpulidos

I think we already noticed the issues with the small molecule transformations tests in https://github.com/choderalab/perses/pull/1065#issuecomment-1243797927 I'll try running the tests with the transformation suggested in that comment and check the results.

Feb 06 '23 17:02 ijpulidos

~@ijpulidos : Can you try running the small molecule tests even longer (5 ns)? If they are still failing even with 5000 iterations, there may be a more significant problem here.~ Nevermind, we already know from https://github.com/choderalab/perses/pull/1065#issuecomment-1233507210 that even 10000 iterations isn't enough for the small molecule neutral test to converge.

Feb 06 '23 19:02 zhang-ivy

Ah actually ignore my above comment, I think you're right that we should change the tests to use phenol to paracetamol. But not sure which transformation to use for the small molecule charge changing tests..

Feb 06 '23 19:02 zhang-ivy

Tests failing are unrelated to the changes of this PR, will address these in a different PR.

Feb 20 '23 23:02 ijpulidos

I can review this later tonight or tomorrow! Please wait to merge. Thanks!

Feb 20 '23 23:02 zhang-ivy

Reviewing this now. Github won't let me comment on lines that haven't been changed, but I noticed that this line: "Note: We are using 50 steps per iteration here to speed up the test. We expect larger DDGs and dDDGs as as result." is out of date and can be removed

Apr 18 '23 14:04 zhang-ivy

perses perses copied to clipboard

Add small molecule repex consistency tests

Description

Motivation and context

How has this been tested?

Change log

Codecov Report

perses
perses copied to clipboard