pypath icon indicating copy to clipboard operation
pypath copied to clipboard

Translating identifiers

Open elifcevrim opened this issue 2 years ago • 5 comments

Describe the question Can we reach all the identifier type list to use as source or target id? For example, can we translate drug ids with their alternatives?

Can we get both source and target identifiers as output of mapping tool, for example as a dictionary, instead of only target identifier as a list?

Desktop (please complete the following information): OS: Windows Python version: 3.8 Version or commit hash v0.13.13

elifcevrim avatar Dec 28 '21 14:12 elifcevrim

That's a very good question, it would be super useful to access these at a single point. This is not possible at the moment, but I can easily implement it.

deeenes avatar Dec 28 '21 21:12 deeenes

Hi Elif, I've just added two new methods to utils.mapping.Mapper: the mapping_tables returns a list of available ID translation tables and the id_types returns a list of identifier types. Use them like this:

from pypath.utils import mapping
mapper = mapping.get_mapper()
mapper.mapping_tables()
mapper.id_types()

Best,

Denes

deeenes avatar Jan 01 '22 02:01 deeenes

One more thing: loading any ID translation table for the very first time depends on the download speed, for UniProt or BioMart data it's typically a few seconds or couple of minutes, but in some other cases might take even half an hour. Later, loading the table from disk is fast, and if you used a table in the past 5 minutes, it remains loaded, making subsequent lookups very fast.

deeenes avatar Jan 03 '22 19:01 deeenes

Hi Denes,

When we use mapping module for multiple source ids, we only get target ids. For example with mapping.map_names(protein_list, 'uniprot', 'interpro'), we get a list of domains that are located in the given protein list. Instead of this, can we get protein specific domain ids like {protein: ["domain1", "domain2",...]} because we couldn’t get the information of which uniprot id matches with these domain ids.

elifcevrim avatar Feb 17 '22 12:02 elifcevrim

Hi Elif,

Sure, you can use a dict comprehension for that:

from pypath.utils import mapping
uniprots = ['P00533', 'O75385']
domains = dict((u, mapping.map_name(u, 'uniprot', 'interpro')) for u in uniprots)
domains
# {'O75385': {'IPR022708', 'IPR011009', 'IPR016237', 'IPR017441', 'IPR000719', 'IPR008271'},
#  'P00533': {'IPR016245', 'IPR032778', 'IPR001245', 'IPR020635', 'IPR006212', 'IPR011009', 'IPR006211', 'IPR000494', 'IPR009030', 'IPR017441', 'IPR000719', 'IPR008266', 'IPR036941'}}

deeenes avatar Feb 17 '22 16:02 deeenes