go-site icon indicating copy to clipboard operation
go-site copied to clipboard

MGI GAF includes PRO isoforms as the main annotatable object

Open cmungall opened this issue 3 months ago • 10 comments

✗ getgaf mgi | egrep  '\tprotein\t'  | tail
PR	Q9Z2D6-2	mMECP2/iso:2	located_in	GO:0005634	PMID:18334558	IDA		C	methyl-CpG-binding protein 2 isoform 2 (mouse)	mMECP2/iso:2|MECP2b (mouse)|MECP2e1 (mouse)	protein	taxon:10090	20120126MGI
PR	Q9Z2D6-1	mMECP2/iso:1	located_in	GO:0005634	PMID:18334558	IDA		C	methyl-CpG-binding protein 2 isoform 1 (mouse)	mMECP2/iso:1|MECP2a (mouse)|MECP2e2 (mouse)	protein	taxon:10090	20101014MGI
PR	Q9Z2D6-2	mMECP2/iso:2	located_in	GO:0005634	PMID:15034150	IDA		C	methyl-CpG-binding protein 2 isoform 2 (mouse)	mMECP2/iso:2|MECP2b (mouse)|MECP2e1 (mouse)	protein	taxon:10090	20120126MGI
PR	Q9Z2D6-2	mMECP2/iso:2	acts_upstream_of_or_within	GO:0006641	PMID:30137367	IMP	MGI:MGI:5584016	P	methyl-CpG-binding protein 2 isoform 2 (mouse)	mMECP2/iso:2|MECP2b (mouse)|MECP2e1 (mouse)	proteintaxon:10090	20200304	MGI
PR	Q9Z2D6-1	mMECP2/iso:1	part_of	GO:0000792	PMID:18334558	IDA		C	methyl-CpG-binding protein 2 isoform 1 (mouse)	mMECP2/iso:1|MECP2a (mouse)|MECP2e2 (mouse)	protein	taxon:10090	20201009	MGI
PR	Q9Z2D6-2	mMECP2/iso:2	enables	GO:0003682	PMID:18334558	IDA		F	methyl-CpG-binding protein 2 isoform 2 (mouse)	mMECP2/iso:2|MECP2b (mouse)|MECP2e1 (mouse)	protein	taxon:10090	20120126	MGI
PR	Q08460-4	mKCNMA1/iso:4	enables	GO:0015269	PMID:16081418	IDA		F	calcium-activated potassium channel subunit alpha-1 isoform 4 (mouse)	mKCNMA1/iso:4|calcium-activated potassium channel subunit alpha-1 isoform STREX-1 (mouse)	protein	taxon:10090	20080813	MGI
PR	Q08460-1	mKCNMA1/iso:1	enables	GO:0015269	PMID:16081418	IDA		F	calcium-activated potassium channel subunit alpha-1 isoform 1 (mouse)	mKCNMA1/iso:1	protein	taxon:10090	20140331	MGI
PR	Q08460-1	mKCNMA1/iso:1	enables	GO:0005249	PMID:16081418	IDA		F	calcium-activated potassium channel subunit alpha-1 isoform 1 (mouse)	mKCNMA1/iso:1	protein	taxon:10090	20140331	MGI
PR	Q08460-4	mKCNMA1/iso:4	enables	GO:0005249	PMID:16081418	IDA		F	calcium-activated potassium channel subunit alpha-1 isoform 4 (mouse)	mKCNMA1/iso:4|calcium-activated potassium channel subunit alpha-1 isoform STREX-1 (mouse)	protein	taxon:10090	20080811	MGI

These should not be here. Instead the annotation should be rolled up to the gene (e.g. Kcnma1 in the case of Q08460), and the isoform should go in column 17

Here is an example of how it should be done:

MGI	MGI:1926176	Gas2l1	located_in	GO:0005737	MGI:MGI:3052497|PMID:12584248	IDA		C	growth arrest-specific 2 like 1	4930500E24Rik|D0Jmb1|GAR22|TU-71.1	protein_coding_gene	taxon:10090	20120921UniProt	part_of(CL:0000586)|part_of(CL:0000017)	UniProtKB:Q8JZP9-2

Note the behavior is correct for all uniprot-sourced annotations and incorrect for MGI sourced (which us PRO).

I assume that this is a matter of the roll up code needing to deal with both PRO isoforms and UniProt isoforms. The situation is inherently confusing due to the fact that in many cases the local IDs are the same (e.g. Q08460-4) yet the actual prefixed ID is arbitrarily different

cmungall avatar Apr 10 '24 16:04 cmungall