brendapy icon indicating copy to clipboard operation
brendapy copied to clipboard

Complete mapping of substance names

Open matthiaskoenig opened this issue 6 years ago • 8 comments

In the flat file the substances are only provided via their names. To my knowledge no name to brenda ligand or chebi mapping file exists which could be used for resolving the information. It is necessary to convert substance identifiers to proper ontologies/annotations. I.e. things like D-glucose must be converted to the respective ChEBI.

At the moment substance information in BRENDA is mapped to chebi based on their substance names (using chebi substance and synonym information) using perfect matching.. This mapping is far from complete and many substances cannot be resolved. Probably some heuristic name matching is needed to completely solve this issue (if somebody knows about a mapping file containing BRENDA names please let me know).

See attached the substance names which cannot be mapped:

unmapped_substances.txt

matthiaskoenig avatar Aug 22 '19 14:08 matthiaskoenig

@matthiaskoenig Hi, I found this link(http://mmtb.tu-bs.de/idparser) for parsing BRENDA ligand ID. For instance, I tried to obtain the mapping for a few compounds listed in unmapped_substances.txt.

The output is BRENDA ligand id. I am not sure how BRENDA ligand id can be mapped to other compound identifiers like KEGG id though. I've written to http://mmtb.tu-bs.de/idparser to find out how to map BRENDA id to KEGG id (or others), will post here if I get a response.

EDIT: http://mmtb.tu-bs.de/ search using a compound name provides the list of all known synonyms of the compound. The search result also maps the compound to a Brenda compound id. For instance, all known synonyms of L-alanine are returned after search

(S)-2-aminopropanoic_acid
(S)-alanine
2-Aminopropanoate
2-Aminopropionate
Ala
alanine
alanine/in
alanine/out
alpha-alanine
L-2-Aminopropionate
L-2-aminopropionic_acid
L-Ala
L-alanin
L-alanine
L-alanine/in
L-alanine/out
L-alpha-alanine
L-alpha-aminopropionic_acid

All the above synonyms are mapped to a Brenda compound id, https://www.brenda-enzymes.org/ligand.php?brenda_group_id=97. In BRENDA, L-alanine is linked to InChIKey QNAYBMKLOCPYGJ-REOHCLBHSA-N.

Hope this is useful

Thanks, Deepa

DeepaMahm avatar Jan 22 '20 06:01 DeepaMahm

Hi @DeepaMahm, thanks for the input. I will have a look at the resource. Best Matthias

matthiaskoenig avatar Feb 17 '20 13:02 matthiaskoenig

Hi @matthiaskoenig It is now possible to access BRENDA database using zeep in python 3 which was problematic before. This has been fixed by BRENDA in the last week.

Please check this link https://www.brenda-enzymes.org/soap.php In the list of fields returned in the output,

parameters = ( "[email protected]",password,"ecNumber*1.1.1.1","organism*Homo sapiens","kmValue*",
              "kmValueMaximum*","substrate*","commentary*","ligandStructureId*","literature*" )

it appears that ligandStructureId can also be obtained. I tried this but all other fields except the ligandStructureId could be obtained for "ecNumber*1.1.1.1", "organism*Homo sapiens".

I have raised this issue to BRENDA again. I will post here if that is working.

Thanks, Deepa

DeepaMahm avatar Feb 18 '20 02:02 DeepaMahm

Hi @matthiaskoenig

This is an update on obtaining ligandStructureId of Brenda compounds, follow-up to the above thread.

The zeep interface of Brenda has been fixed a couple of months back and it's now possible to obtain the following fields (listed in parameters variable below) through query

from zeep import Client
import hashlib

wsdl = "https://www.brenda-enzymes.org/soap/brenda_zeep.wsdl"
password = hashlib.sha256(str("enterpassword").encode('utf-8')).hexdigest()
client = Client(wsdl)
parameters = ("enteremailid", password, "ecNumber*1.1.1.1", "organism*Homo sapiens", "kmValue*",
              "kmValueMaximum*", "substrate*", "commentary*", "ligandStructureId*", "literature*")
resultString = client.service.getKmValue(*parameters)
print(resultString)

Post this, we could map "ligandStructureId*" of Brenda to other compound identifiers like CHEBI or SABIO compound id using the service available here https://www.ebi.ac.uk/unichem/

I hope this would be useful for mapping Brenda compound/substance names.

Thanks, Deepa

DeepaMahm avatar May 28 '20 11:05 DeepaMahm

Hi @matthiaskoenig I tried to do the mapping (brenda ligand id to kegg, hmdb and chebi identifiers) via unichem's rest interface. Please check this at your convenience.

DeepaMahm avatar Jun 22 '20 06:06 DeepaMahm

!/usr/bin/python
from zeep import Client
import hashlib

wsdl = "https://www.brenda-enzymes.org/soap/brenda_zeep.wsdl"
password = hashlib.sha256(str("enterpassword").encode('utf-8')).hexdigest()
client = Client(wsdl)
parameters = ("emailid", password, "ecNumber*1.1.1.1", "organism*Homo sapiens", "kmValue*",
              "kmValueMaximum*", "substrate*", "commentary*", "ligandStructureId*", "literature*")
# resultString = client.service.getKmValue(*parameters)
param = ("id", password, "NAD+")

resultString = client.service.getLigandStructureIdByCompoundName(*param)

print(resultString)

DeepaMahm avatar Jun 30 '20 16:06 DeepaMahm

Hi @matthiaskoenig @DeepaMahm --

Has there been any progress on this particular task? I've bumped into a similar problem, compounded by the fact that the scale I'm stuck working at is substantial enough that it would be ideal/possible to not use zeep or an online client.

Specifically, I'm looking to map BRENDA ligands to their InChIKey, ideally through CHEBI. Are there any available flatfiles now where I can grab BRENDA ligandid > chebi > etc?

Happy to open another issue if I've superseded this particular one.

Thanks so much,

Braden Tierney

b-tierney avatar Jul 18 '21 16:07 b-tierney