biolink-model icon indicating copy to clipboard operation
biolink-model copied to clipboard

PHARMGKB.CHEMICAL -> PHARMGKB.DRUG prefix

Open sierra-moxon opened this issue 11 months ago • 1 comments

From @colleenXu:

Is it a bit inconsistent for the prefix PHARMGKB.PATHWAYS to be plural like this (vs KEGG.PATHWAY)? asking for clarification:

  • is it more correct to have different prefixes for each type (PHARMGKB.CHEMICAL, PHARMGKB.DISEASE, etc.) rather than 1 prefix (PHARMGKB)?
  • is it fine that we keep the actual ID value, which always starts with "PA", ex: PA25408, PA444750? We don't remove this "PA"...

Sierra Moxon (SRI) I try hard to follow the lead at bioregistry.org in selecting prefixes for CURIEs. Most of the time, the resource itself will choose its prefix scheme, following best practices there. I don't see a specific guideline around pluralizing: https://github.com/biopragmatics/bioregistry/blob/main/docs/CONTRIBUTING.md#submitting-new-prefixes. A ticket there for clarification on pluralizing best practices would be wonderful.

Sierra Moxon (SRI) In the PHARMGKB examples, different prefixes already exist in bioregistry and I would use them as they are there.

In general, it really depends on the Resource that is responsible for the expansion of CURIEs -> IRIs. If a source has different URL schemes for a chemical vs. a disease, and that source can disambiguate chemicals from diseases and route the expansion correctly internally, then a prefix that encompasses all of the different types is likely fine. An example of a resource like this is ZFIN. zfin.org/ZFIN:[any_specific_id] is the expansion for all identifiers from ZFIN whether the 'any_specific_id' is a chemical, gene, disease, etc. ZFIN handles redirecting the expansion to the appropriate IRI. (edited)

For most resources though, having a separate ("." delimited) prefix that expands different domains (for lack of a better word) to specific IRIs for those domains works better.

Colleen Xu (Exploring Agent) so this answers the first two questions, so the 3rd is still hanging

and huh interesting that bioregistry has pharmgkb.disease / gene / pathways and they all resolve as-expected, but not chemicals. instead it has pharmgkb.drug, and clicking the "resolve" button brings me to the /chemical/ page...

oh and I see the bioregistry pages list the pattern for IDs as starting with PA so I guess that's fine? (3rd question)

Sierra Moxon (SRI) Bioregistry calls these kinds of additions "bananas" (I do not know why they are called bananas) https://github.com/biopragmatics/bioregistry/issues?q=is%3Aissue+is%3Aopen+banana I think the regex expression on the bioregistry prefix should show how the "local" part of the identifier is constructed.

Colleen Xu (Exploring Agent) hmmm there's a "Pattern for Local Unique Identifiers" part of the pages though, which seems immediately helpful?

like this for chembl.compound saying the ID should also start with CHEMBL (edited)

Sierra Moxon (SRI) yep, for new prefixes, I think bioregistry suggests avoiding "bananas" but supports them in existing prefixes via that "Pattern for Local Unique Identifiers" paradigm.

Colleen Xu (Exploring Agent) so I think the last unresolved part of this convo is the "pharmgkb.drug" in bioregistry vs biolink-model using pharmgkb.chemical https://ncatstranslator.slack.com/archives/C014B3JAX36/p1689014242439179?thread_ts=1688745603.139509&cid=C014B3JAX36

and huh interesting that bioregistry has pharmgkb.disease / gene / pathways and they all resolve as-expected, but not chemicals. instead it has pharmgkb.drug, and clicking the "resolve" button brings me to the /chemical/ page... From a thread in datamodeling | Yesterday at 11:37 AM | View reply

Sierra Moxon (SRI) yeah, we need to reconcile. easiest path is for us to update biolink to use the recognized prefix from bioregistry. And, to open a ticket in bioregistry for a synonym prefix of pharmgkb.chemical to be added to the pharmgkb.drug prefix.

Colleen Xu (Exploring Agent) yeah, not urgent though. BTE is using PHARMGKB.CHEMICAL, for now, to comply with biolink-model 3.5.0

sierra-moxon avatar Jul 11 '23 23:07 sierra-moxon

@vdancik MolePro also uses PHARMGKB.CHEMICAL - we need to review with Chemical WG.

sierra-moxon avatar Mar 28 '24 18:03 sierra-moxon