protein-ligand-benchmark
protein-ligand-benchmark copied to clipboard
Changes PLB from openfe
As discussed with @ijpulidos and @jchodera , here the PR from the changes that we made to the PLB.
See below for a list of things that have been changed in this PR.
Benchmarking results obtained using this PR:
- https://docs.google.com/document/d/1Gk1LCH8OyPrJnel8dPfmcRw0IVTF1y1Pizo4inO_1jQ/edit?usp=sharing
- cdk2, hif2a, mcl1, p38, pde2, pfkfb3, ptp1b, shp2, tyk2
- Some calculations are still running, this document will be updated as more calculations finish
- We will move to a better way of storing our benchmarking data in the future
Systems that have not been benchmarked yet as they involve net charge changes:
- cdk8, cmet, eg5, syk, thrombin, tnks2
We added tags to the systems in a separate PR, highlighting some challenges of the specific systems
- https://github.com/OpenFreeEnergy/protein-ligand-benchmark/pull/14
Some additional notes on potential problems that could be fixed in future releases of the PLB:
- https://github.com/OpenFreeEnergy/protein-ligand-benchmark/issues/15
Things changed in the PR
Thrombin:
- Problem 1
- Lig_7d deposited 3x (3 different stereoisomers)
- Action
- Removed ligand lig_7d
- Rationale
- unspecified absolute configuration in the experimental data (https://www.sciencedirect.com/science/article/abs/pii/S0022283609005075?via%3Dihub)
- Details in this issue
- https://github.com/OpenFreeEnergy/protein-ligand-benchmark/issues/5
- Problem 2
- The primary amine that is present in all ligands is neutral, but was charged (+1) in the old PLB (Hahn et al.)
- Action:
- Changed the protonation state of the primary amine to be +1
- Rationale:
- According to the pka prediction (that was performed after the change), both protonation states are likely present at pH=7
- One could also use the neutral state of the primary amine
- Details in this issue
- https://github.com/OpenFreeEnergy/protein-ligand-benchmark/issues/7
SYK
- Problem
- ligands lig_CHEMBL3265030_n and lig_CHEMBL3265035 have both two copies of the ligand in the .sdf, those are two "stereoisomers" (around a tetravalent nitrogen, so not a real stereo center, but an "MD stereo center")
- Both ligands were neutral in the old PLB (Hahn et al.) and the Schrodinger set (Ross et al.)
- Rationale
- According to pka prediction (done with Simulations Plus ADMETpredictor V.11 in the context of the OpenFF consortium for method development), these ligands should be neutral
- Action
- Changed protonation state ligands lig_CHEMBL3265030_n and lig_CHEMBL3265035 according to the pka prediction
- Details in this issue
- https://github.com/OpenFreeEnergy/protein-ligand-benchmark/issues/2
P38
- Problem
- Lig_p38a_2ff was deposited twice with two orientations of the cyclohexyl ring
- Action
- Removed one configuration
- Rationale
- kept the configuration that was more similar to the one in the old PLB (Hahn et al.) and in the dataset from Schrodinger (Ross et al.)
- Details in this issue
- https://github.com/OpenFreeEnergy/protein-ligand-benchmark/issues/4
Eg5
- Problem
- ligand lig_CHEMBL1084935 was present twice, with two different “stereocenters” (around a tertiary charged amine)
- Action
- Kept the same isomer as was used in the old PLB (Hahn et al.) and in the dataset from Schrodinger (Ross et al.), discarded the other stereoisomer
- Rationale
- Stereo center will not interconvert in the simulation because it is tetravalent and would require constant pH MD to interconvert
- Details in this issue
- https://github.com/OpenFreeEnergy/protein-ligand-benchmark/issues/3
Tnks2
- Problem
- lig_7 was negatively charged even though all other ligands in the series have the same core, but different protonation state of the core
- Rationale
- According to pka prediction (using Simulation Plus ADMETpredictor V.11), these ligands should be neutral at pH=7 (though the pka is somewhat close with 8.21)
- Action
- Changed the protonation state of this ligand such that the core of all ligands in this set has the same net charge
- Details in this issue
- https://github.com/OpenFreeEnergy/protein-ligand-benchmark/issues/1
PFKFB3
- Problem
- Cofactors were not readable by rdkit
- Action
- Fix cofactors
- Note: for POP, we used the protonation state as suggested by openeye, but also a different protonation state could be possible
PDE2
- Problem
- Sidechain for two of the ligands (49932129 and 50107616) pointing towards a different orientation compared to other ligands and towards the ions, leading to samping problems
- Action
- Changed the orientation of the sidechain to be more similar to other ligands, more similar to other co-crystal ligands, e.g. PDB ID 4d08
- Changed the orientation of the sidechain to be more similar to other ligands, more similar to other co-crystal ligands, e.g. PDB ID 4d08
Networks
Generated new networks
- Kartograf mapper, Lomap scorer
- MST, radial, Lomap networks
- Allow element changes
- Not allowing element changes
- Lomap mapper, Lomap scorer
- MST, Lomap networks
- Allow element changes
- Not allowing element changes
@hannahbaumann , this is amazing work! Thanks for this contribution. I checked the connectivity of the networks and I found out that some of them are disconnected (maybe that's expected?). Specifically:
TARGET: cdk8
NETWORK: 03_edges/lomap_mapper_lomap_network.yml, is connected: False
NETWORK: 03_edges/lomap_mapper_lomap_network_no_element_changes.yml, is connected: False
TARGET: eg5
NETWORK: 03_edges/lomap_mapper_lomap_network.yml, is connected: False
NETWORK: 03_edges/lomap_mapper_lomap_network_no_element_changes.yml, is connected: False
TARGET: syk
NETWORK: 03_edges/lomap_mapper_lomap_network.yml, is connected: False
NETWORK: 03_edges/lomap_mapper_lomap_network_no_element_changes.yml, is connected: False
TARGET: thrombin
NETWORK: 03_edges/lomap_mapper_lomap_network.yml, is connected: False
NETWORK: 03_edges/lomap_mapper_lomap_network_no_element_changes.yml, is connected: False
TARGET: tnks2
NETWORK: 03_edges/lomap_mapper_lomap_network.yml, is connected: False
NETWORK: 03_edges/lomap_mapper_lomap_network_no_element_changes.yml, is connected: False
I took the liberty to do a basic inspection/visualization of the networks and print the "disconnected components" (at least the smallest ones). You can check that at the bottom of this notebook, just in case it helps to know which are the ligands that are disconnected.
Thank you so much @ijpulidos , this is super helpful! It looks like it's the 5 systems that involve net charge changes. I'll be looking into this!
@ijpulidos : Here are the input .graphml
files for systems MCL1, TYK2, p38, cdk2, hif2a:
ligand_network_mcl1.graphml.zip
ligand_network_tyk2.graphml.zip
ligand_network_p38.graphml.zip
ligand_network_cdk2.graphml.zip
ligand_network_hif2a.graphml.zip
I think Lomap has been modified to be able to generate connected networks when net charge transformations are present. We should probable regenerate these networks here with the latest version of lomap.
Not had time to look at this sorry - Just a quick 2 mins input, I think some of the ligands.yaml entries for the formal charges need changing.
@IAlibay Can you specify which ones? And should we capture this in this PR or make the issue and handle it in another one?