Attached ligand to protein
I was wondering what would happen with the score for a pocket if a ligand attached to the peptidic chain was present in such potential pocket, would the score for that pocket be affected? Or will the ligand be ignored?
That depends. P2Rank tries to disregard/ignore all the ligands before running prediction. Ligands attached to peptide chains should also be ignored, as long as they are stored in the PDB/CIF file as HETATM records. Peptides themselves are treated as part of the protein unless you explicitly exclude them using the chains column in the .ds file.
That said, there are some edge cases where this may not work as expected.
How would you like P2Rank to behave in such cases - exclude/ignore the peptide, the ligand, both, or neither? Do you have any specific examples or structure files where this happens?
So I contemplate two scenarios where a ligand is covalently attached to an aminoacid within the peptidic chain:
(1) There is the case of modified residues (non-canonical aminoacids) Ej. 1MDX, LLP or 3CRK, LA2.
(2) And the case of "regular" ligands/cofactors that are covalently attached to the peptidic chain. Ej. 2v61, FAD attached - 4ri7, GSH attached or 4eic, HEC attached.
How does p2rank deal with these cases? Are these residues ignored or taken into account?
In my personal case, I would like to use p2rank to predict potential pockets and perform dockings on them, so I think it woul be nice to take them into account. Doing so in a programatic way (not manually) would be awesome.
Just to make sure I am expressing it correctly: I would like p2rank to NOT remove these covalently bound molecules for a user-defined set of them, and then p2rank generating (through fpocket) and ranking (through the random forest) pockets in the presence of these moieties, so that most or all the corresponding pocket would dissappear due to the covalently bound molecule).
Thanks for adding clarification. I see a lot of sense in having this implemented in P2Rank.
(2) Regarding cofactors: so you’d like an option to "consider specified cofactors as part of the protein" on a per-dataset and/or per-protein basis. Per-dataset: could be implemented as a command-line parameter with a list of group/residue names in the PDB (e.g., HEM). Per-protein: this could be a column in the .ds file where you define specific molecules, not just by name but also by group_id/atom_id.
What do you think?
By the way, I don't see any obstacle in having this option implemented for standard P2Rank (predictions without fpocket), but for fpocket rescoring there is an issue. As far as I know, fpocket doesn't yet allow you to specify which cofactors should be considered part of the protein. It does have a hard-coded list of such cofactors, so it might work most of the time and could eventually be exposed as a configurable option.
Does it matter for your use case whether the cofactors are covalently bound, or is being able to specify them sufficient?
(1) For modified residues: in most cases P2Rank should already consider them part of the protein, at least that's the intention. Do you have any specific PDB entries where this doesn't work as expected? If so, please share them, in that case it's a bug I'd like to fix. I'm not sure how fpocket deals with modified residues but I guess it's the same: intention is to have them as part of the protein but it might fail to do so in certain cases.
(2) I think a per-protein implementation would be preferable as it would give more flexibility. Specifying the cofactors would be enough, no need to distinguish between covalent vs non-covalent binding.
I’m only interested in the p2rank predictions, not fpocket ranking. However, as far as I understand, p2rank “default” mode starts with fpocket pockets, which then are later ranked through the random forest. So the issue with fpocket rescoring would be the same for the “default” p2rank prediction, right? Or you are using a modified version of fpocket?
(1) The examples we’ve seen so far p2rank gives a lower score in the presence of these modified amino acids, which makes sense. However, as far as I understand p2rank uses amino acid level descriptors, so what is the random forest using for these unusual amino acids?
P.D. (Edited) I made a tiny mistake when writing the message. I just made a change in the second paragraph, second line. Sorry for that! Hope I did not mislead you.
(2) Ok I'm going to implement this. Will you be able to help me with testing?
P2Rank doesn't use fpcocket by default. That is, if you run prank predict fpocket is not called. You can rescore pockets predicted by fpocket by running prank rescore or prank prank fpocket-rescore but I as I understand, this is not what you care about?
(1) One reason for lower scores could be the fact that P2Rank doesn't do as comprehensive substitutions of modified AA codes as in pdbfixer (non-canonical aminoacids). I'm going to incorporate these substitutions into P2Rank and we'll see what happens. Thank you for pointing me to this.
It would be very helpful if you could share more examples (of both cases 1&2) you care about or you find interesting. I will use them as small dataset for unit tests.
Hi Radoslav,
sorry for intruding into your conversation, I'm Gonzalo Colmenarejo, Andrés' superviser. Thank you for your willingness to provide an update of p2rank, which we consider a very useful tool, for this docking case.
(2) yes, Andrés will be very happy to test this as we're very interested in docking compounds in the presence of cofactors. Thank you for the clarification about p2rank. Yes, p2rank "default" mode is what we are interested in.
(1) I guess that p2rank providign lower scores in the presence of modified amino acids or covalently bound cofactors should in principle be good, as in principle a bulkier moiety in the attachment region would reduce the pocket space and therefore make more difficult for a ligand to bind there. Unless the amino acid modification or the cofactor provides features that foster ligand binding there, in which case one would expect a increase of the score. We're not clear about how p2rank takes into account or balances these two factors, how would it represent in the RF model the cofactor or modified amino acid, and how is that related to pdbfixer amino acid code set. It would be great if you provide some info.
Regarding a dataset with modified amino acids (1): 6M5O: LLP + glycodeoxycholic acid 2TPL: LLP + hydroxyphenylpropionic acid 4BTY: TQP + pyridazone 2CFK: TPQ + ruthenium complex
Regarding a dataset with covalently attached cofactors (2): 1AHP: PLP + alpha-maltose 1AXR: PLP + azolopyridine 1ELI: FAD + pyrrolocarboxylate 1OJA: FAD + isatin
Hi Radoslav,
do you have any news about this?
Thanks
Hi bbu-imdea, thank you for provining more details and examples. I'm already working on P2Rank updates: a) custom modified amino acid mappings with pdbfixer preset and b) ability to define custom cofactors as part of the protein. I'll share more details soon and I'll appreciate your input.
Hello Radoslav, has any of the issues by any chance been solved?
We are very interested in implementing any of those updates as soon as they are available (even if one of them has not been completed yet)