gmx_MMPBSA icon indicating copy to clipboard operation
gmx_MMPBSA copied to clipboard

[Bug-gmx_MMPBSA]: Bug in gmx_MMPBSA: Alanine scanning

Open leiqian-nmsu opened this issue 1 year ago • 15 comments

Bug summary

Tried to use gmx_MMPBSA: Alanine scanning function to mutate a residue (on the interface of two proteins) to Alanine. But it could not work.

Terminal output

[INFO   ] Building tleap input files...
  File "/home/dell/miniconda3/envs/gmxMMPBSA/bin/gmx_MMPBSA", line 8, in <module>
    sys.exit(gmxmmpbsa())
  File "/home/dell/miniconda3/envs/gmxMMPBSA/lib/python3.9/site-packages/GMXMMPBSA/app.py", line 98, in gmxmmpbsa
    app.make_prmtops()
  File "/home/dell/miniconda3/envs/gmxMMPBSA/lib/python3.9/site-packages/GMXMMPBSA/main.py", line 545, in make_prmtops
    self.FILES.mutant_receptor_prmtop, self.FILES.mutant_ligand_prmtop) = maketop.buildTopology()
  File "/home/dell/miniconda3/envs/gmxMMPBSA/lib/python3.9/site-packages/GMXMMPBSA/make_top.py", line 121, in buildTopology
    tops = self.makeToptleap()
  File "/home/dell/miniconda3/envs/gmxMMPBSA/lib/python3.9/site-packages/GMXMMPBSA/make_top.py", line 1399, in makeToptleap
    tif.write(f'bond MREC_OUT.{cys1}.SG MREC_OUT.{cys2}.SG\n')
ValueError: I/O operation on closed file.
Error occurred on rank 0.
Exiting. All files have been retained.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

gmx_MMPBSA.log

gmx_MMPBSA.log

Operating system

Ubuntu 20.04

gmx_MMPBSA Version

gmx_MMPBSA v1.5.6+21.gc3e03f6 based on MMPBSA version 16.0 and AmberTools 20

Python version

PYTHON VERSION: 3.9.12

Installation

conda AmberTools + conda

leiqian-nmsu avatar Aug 16 '22 16:08 leiqian-nmsu

The Amber topology generation from the structure will be removed in future versions. Please add the -cp topol.top option.

Valdes-Tresanco-MS avatar Aug 16 '22 16:08 Valdes-Tresanco-MS

@Valdes-Tresanco-MS Thanks! It works.

Actually when I ran regular GB only, I did not include my own topology files in working directory (and did not use -cp flag), and they worked well; However, when I ran GB+AlaScan, in order to avoid errors I have to include my own topology files and force field files in working directory and use -cp flag.

In addition, I notice that for the same protein-protein system, when I ran GB only using igb=2 and PBRadii=3, I would get a very high interaction entropy (>100); however, when I ran GB+AlaScan using igb=8 and PBRadii=4, the interaction entropy could be relatively normal: around 40. My question is: do these two sets of parameters really make such big difference ? Thank you!

leiqian-nmsu avatar Aug 17 '22 10:08 leiqian-nmsu

In principle, different GB models won't give you different IE entropies since IE depends on the Interaction energy... could you please send us the files you are using to see what's going wrong?

marioernestovaldes avatar Aug 17 '22 10:08 marioernestovaldes

The IE is calculated from ΔGGas, so we would not expect such a significant variation with the two configurations. However, by defining the cas_intdiel variable, the internal dielectric constant is automatically changed, thus affecting the electrostatic component and hence ΔGGas. In any case, please send us the files to rule out any error or inaccuracy of gmx_MMPBSA if exists.

Valdes-Tresanco-MS avatar Aug 17 '22 13:08 Valdes-Tresanco-MS

@Valdes-Tresanco-MS @marioernestovaldes Thanks very much for your kind replies. I really appreciate it. I reran the system using <igb = 2, PBRadii = 3, cas_intdiel=1>, and found the results were similar as <igb = 8, PBRadii = 4, cas_intdiel=1>, so I think "cas_intdiel=1" made the difference because previously I did not include this flag cas_intdiel so it was set to 0 automatically in regular GB calculations. My question is: is this flag "cas_intdiel" specific for alanine_scanning? Because in your input webpage it is listed under alanine_scanning section, so when I ran regular GB, I did not realize that I should explicitly set cas_intdiel to 1. Thank you!

leiqian-nmsu avatar Aug 17 '22 17:08 leiqian-nmsu

As described in the documentation cas_intdiel is a variable that assigns a value to the internal dielectic constant (intdiel for GB) according to the nature of the residue to be mutated according to Yan et al., 2017. However, to vary the dielectric constant the nature of the system is very important, so using cas_intdiel can be counterproductive in some cases. Note that Yan et al., 2017 determined these values for a few systems that do not necessarily fit yours. You should be careful with the variables you assign, especially the conditions under which the variable may vary in relation to your system. Use cas_intdiel only if you consider that the study by Yan et al., 2017 fits yours, otherwise assign intdiel manually. When cas_intdiel is defined, in the terminal output must explicitly show what value is assigned to the intdiel variable.

Valdes-Tresanco-MS avatar Aug 17 '22 20:08 Valdes-Tresanco-MS

@Valdes-Tresanco-MS Thank you for your reminding. I read the paper (Yan 2017). From my understanding, when I run regular GB, default intdiel (Internal dielectric constant) is 1.0, which can be seen as a "global average" value for the whole protein system. Because different protein systems with different natures(e.g. different structures) tend to have different "global average" intdiel values, so I need to set the intdiel value manually case by case for different protein system, if not, the result would not be convincing. However, when I run GB+AlanineScan with cas_intdiel=1, this setting focuses on the "local nature" of the residue to be mutated, and sets "global average" intdiel of the whole protein system according to the local nature of residue. So if the local nature of the residue to be mutated is quite different from the "global average" nature of whole protein system, this cas_intdiel=1 option would be dangerous. Is my understanding correct?

leiqian-nmsu avatar Aug 21 '22 17:08 leiqian-nmsu

Indeed. That is why cas_intdiel = 0 by default. As a number of parameters are available in gmx_MMPBSA, they should be used with care, that's why we always put the references to the respective articles. The user should carefully review the studies that we reference to know if such a configuration can be used in the system of interest. Unfortunately, most users change the parameters without any basis. Endpoint methods, such as PB/GB/RISM and LIE are generally system-dependent, so using them requires a thorough study of the system characteristics. We will put more warnings in the documentation to better guide users. Let me know if you have any other questions.

Valdes-Tresanco-MS avatar Aug 21 '22 18:08 Valdes-Tresanco-MS

@leiqian-nmsu you may also want to check this publication to shed light on the essential physics of the dielectric response for all values of ϵin and ϵout. We have implemented the ALPB approximation in case you want to use it...

hope this helps!

marioernestovaldes avatar Aug 21 '22 20:08 marioernestovaldes

@Valdes-Tresanco-MS @marioernestovaldes Thank you for your explanation! As @Valdes-Tresanco-MS mentioned "so using them (system-dependent parameters) requires a thorough study of the system characteristics". Because intdiel is one of system-dependent parameters, is there any reliable method for us to quickly determine intdiel value for a specific whole protein system? Does "a thorough study of the system characteristics" suggest we need to count and compare the numbers of polar, non-polar, and charged residues in the protein system then estimate the value of intdiel? I read the paper mentioned by @marioernestovaldes: "Incorporating variable dielectric environments into the generalized Born model", From the paper it seems that the authors were more interested in the ratio: ε_in / ε_out rather than single ε_in value for a system.

leiqian-nmsu avatar Aug 24 '22 18:08 leiqian-nmsu

the proper assignment of the dielectric constant is still in debate. It's generally accepted that systems heavily charged require a higher dielectric constant to get better results... for now, the strategy proposed by Yan et al. was tested using more than 400 mutants and worked really well although from the physical point of view it is not the most elegant one...

marioernestovaldes avatar Aug 25 '22 00:08 marioernestovaldes

@Valdes-Tresanco-MS @marioernestovaldes Thanks! I will reread this Yan et al. paper and try to find the suitable intdiel value for my system and figure it out.

leiqian-nmsu avatar Aug 29 '22 09:08 leiqian-nmsu

@Valdes-Tresanco-MS @marioernestovaldes Hope everything is going well. These days I compared ddG results calculated by both alanine scan and regular GBSA (WT => Ala mutant) (intdiel were set to the same value in both cases), but found some inconsistent results.

For example, after 10ns simulation for a protein-protein system, with alanine scan I found some important residues (ddG > 2kcal/mol when mutated to Ala) on their binding interface. Then for each of above selected residues, I mutated each to different amino acids (including alanine), after these operations, I ran 10ns simulation for each of these mutants (including WT), and calculated their dG binding and ddG binding (dG_mutant - dG_WT).

Because alanine mutants were also included in above mutations, I got two sets of ddG data for alanine mutation (dG_alanine_mutant - dG_WT) at each important position (one is from alanine scan; the other one is from above GBSA calculations). However, I found some of them were not consistent (1st one was caused by relatively large SD): e.g. posiiton1: alanine scan result: THRxALA :+2.88(+/- 0.82), but md ddG:-2.91(+/-5.3) (intdiel=3), position2: alanine scan result: TYRxALA:+4.27(+/- 0.94), but md ddG:-27.76(+/-5.1) (intdiel=3), et.al. I would like to know whether these inconsistencies are not uncommon? In addition, as you can see, the md ddG could be a very large value such as -27.76 kcal/mol when compared with alanine scan result, did their different algorithms cause this beyond-normal value?

leiqian-nmsu avatar Sep 21 '22 15:09 leiqian-nmsu

This is expected taking into account all the errors associated with the force fields. The alanine scanning method is very simple, yet very efficient and easier to reproduce, as long as the mutation by alanine does not cause a large disruption of the interface due to conformational changes. Based on this premise, using the alanine scanning method assumes that both the wild-type and the mutated complexes behave equally, and the observed differences are only associated with the point mutation. That way, all the errors associated with the force fields are canceled, in other words, there is a cancellation of errors. A quick check upon the influence of the dynamics on the calculation will be running both simulations, for the WT and the mutant, starting with the same distribution of velocities (same 'seed' in the .mdp file) and with restraints. This way will be very similar to what the alanine scanning method does in gmx_MMPBSA and the results should be similar.

In short, I will go first with the alanine scanning method in gmx_MMPBSA as long as the mutation by alanine does not cause a large disruption of the interface due to conformational changes.

hope this helps!

marioernestovaldes avatar Sep 21 '22 22:09 marioernestovaldes

Just to complement @marioernestovaldes's explanation. Scanning of alanine is accomplished by truncating by the beta carbon the selected amino acid in an already executed trajectory. However, by mutating any residue by alanine, even if you start with the same velocity distribution, it is possible that there will be changes around the mutated residue during the simulation. The advantage of alanine scanning is that it allows estimating the contribution of several residues from a single trajectory.

Valdes-Tresanco-MS avatar Sep 22 '22 00:09 Valdes-Tresanco-MS