hgvs Dataprovider - get canonical / MANE transcript

There is often the need for picking 1 transcript for a gene. This is often referred to as the canonical transcript, and nowadays (clinically) is usually the MANE transcript

It would be good to add a method to data provider to be able to retrieve the MANE transcript from a gene name

Transcripts could also have a field/flag on them saying whether they are MANE transcripts

Once we have the data provider API done, implementations like UTA or cdot could implement them

This is necessary to implement #517

Also came up as a request for #743

Aug 05 '24 11:08 davmlaw

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Nov 04 '24 02:11 github-actions[bot]

The main question here is how to handle UTA implementation. It doesn't look like a modification to that will be available any time soon

I could make a shim around calls to UTA and implement MANE stuff by loading MANE.GRCh38.v1.4.summary.txt.gz - 1.1Mb

So you could either supply that file, or get a NotImplementedError thrown

I will have a crack at this when I get home (currently at a conference)

Nov 07 '24 02:11 davmlaw

This could be implemented somewhere around the get_tx_for_region method. Eg. as a second filter on top of the initial response, (only return tx_acs that are part of MANE transcripts).

Nov 17 '24 16:11 andreasprlic

First, we need a canonical definition of canonical....

"canonical" isn't a property of a transcript but a label/choice, made differently by annotation providers (RefSeq/Ensembl) as well as build, and release (MANE version)

Eg using GFF/GTF tags:

Annotation	Build	GFF Tags
RefSeq	GRCh37	RefSeq select
RefSeq	GRCh38	RefSeq select, MANE
Ensembl	GRCh37	n/a - see notes below
Ensembl	GRCh38	Ensembl_canonical, MANE

Ensembl GRCh37 - "canonical" is exposed in the Ensembl REST API

So given the choice of:

Pick canonical server-side

Make a new API request for canonical
Add a new field {"canonical": True}

Pick canonical client-side

Dataprovider get_tx_for_region and get_tx_for_gene returns transcript "tags" (eg "Ensembl_canonical, MANE")
The client consumes the transcript/tags and then uses a CanonicalPicker class to decide which one is canonical

I lean towards the client-side as:

Client decides on details of how to pick canonical transcript
We don't have to wait for UTA - we can immediately implement LongestTranscriptCanonicalPicker or make a local version that loads MANE text file and uses that plus transcripts to pick
We can make a client picker that reads tags, picks eg tags containing MANE, Ensembl, RefSeq select, sorts then returns the highest per desired contig - this will be ready to go once UTA has tags in it.

I guess a question is whether UTA should have eg multiple versions of MANE in it. In which case, we'd need to pass in the MANE version somewhere in the API. Or you could just return tags + versions eg ["MANE:v1.3", "MANE:v1.4"] and do the picking in the client again

Jul 29 '25 04:07 davmlaw

I'm going to remove this from the hgvs 2.0 milestone. I agree that biocommons tools should make it easy to identify a MANE transcript, but I'm a bit skeptical that this should be in the hgvs package itself.

Oct 15 '25 16:10 reece