biosimspace icon indicating copy to clipboard operation
biosimspace copied to clipboard

Matching atoms and merging ring systems (fused rings, different hybridisation states) - issues

Open annamherz opened this issue 1 year ago • 8 comments

Hello! I'm currently matching and merging molecules for MCL1 ligands. The basic code used to replicate these is:

# ligands
lig0 = "lig_27"
lig1 = "lig_42"
# options
prematch = {}
ring_size = False
ring_break = False
complete_ring = True

lig_0 = BSS.IO.readMolecules([f"{lig0}.rst7", f"{lig0}.prm7"])
lig_1 = BSS.IO.readMolecules([f"{lig1}.rst7", f"{lig1}prm7"])

mapping = BSS.Align.matchAtoms(
                            lig_0, lig_1,
                            complete_rings_only=complete_ring,
                            prematch=prematch,
                            )    

inv_mapping = {v: k for k, v in mapping.items()}
lig_1_a = BSS.Align.rmsdAlign(lig_1, lig_0, inv_mapping)

merged_ligands = BSS.Align.merge(lig_0, lig_1_a, mapping,
                                allow_ring_breaking=ring_break,
                                allow_ring_size_change=complete_ring
                                  )

With the files named as the ligands attached. ligands.zip

There are some different situations that occur in relation to ring systems, outlined below.

Case 1

In the most basic case, as in the case of some ligands (lig_27~lig_42), this does not proceed correctly initially. Many of the atoms are included in the perturbable region and some of the shared atoms are not included in the MCS. image

When the mapping dictionary {21:6} is added as prematch, this is then able to proceed okay. Prematch for this ligand series is chosen so the N in the core region match. image Notably, this is able to proceed with ring_size=False and ring_break=False and the ring is mapped correctly to the fused ring system, which is not the case in Case 2.

Case 2

However in a another case, lig_27~lig_43, when the mapping dictionary{ 21:26 } which is also the N is added, this is not able to proceed without setting ring_size=True, ring_break=True . In this case then the resulting perturbed region is like below: image I think ideally, with the introduced mapping, this should still be able to proceed with ring_size and ring_break set as False, and the whole ring region in this case is taken as the perturbed region in each case. So for lig_27 the phenyl ring and for lig_43 the entire fused ring system. Then if these were both set to True, in that case the phenyl ring should ideally be mapped onto an entire ring in the fused phenyl as opposed to split between the two fused rings. The difference between lig_42 and lig_43 in this case is minor, with lig_42 being a 1-napthyl group and lig_43 being a 2-napthyl group, but this causes a large difference in the mapping.

Case 3

Here, lig_43~lig_45, there is a change of hybridisation state with the rings. Again, without prematch, the core region of the molecule is ignored: image

And when prematch {26:6} is introduced, with ring_size=False, ring_break=False it then maps the differently hybridised C to each other. The differently hybridised atoms should not map to each other as this is not great for the free energy perturbation. image

I was wondering if there would be some way to programmatically address the mapping and merging ring perturbations so the intended perturbable molecule can be obtained? I guess the atoms could also be removed manually from the generated mapping that is passed into the merge function, however this requires quite some time per perturbation to do manually as each atom needs to be inspected then.

Thank you!

annamherz avatar Jun 15 '23 13:06 annamherz