go-site icon indicating copy to clipboard operation
go-site copied to clipboard

RNAC RNA types are getting mangled by the pipeline (tested by gorule-0000001)

Open cmungall opened this issue 5 months ago • 27 comments

Source:

✗ curl -L -s https://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human_rna.gaf.gz | gzip -dc | grep URS000075D95B_9606 | cut -f2,3,9-12
URS00004176D4_9606	URS00004176D4_9606	F	Homo sapiens (human) hsa-miR-185-5p		miRNA
URS000075D95B_9606	URS000075D95B_9606	F	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	F	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	F	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	C	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	C	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA
URS000075D95B_9606	URS000075D95B_9606	C	Homo sapiens (human) X inactive specific transcript (XIST)		lncRNA

what we end up publishing:

✗ curl -L -s http://current.geneontology.org/annotations/goa_human_rna.gaf.gz | gzip -dc | grep URS000075D95B_9606 | cut -f2,3,9-12
URS00004176D4_9606	URS00004176D4_9606	F	Homo sapiens (human) hsa-miR-185-5p		miRNA
URS000075D95B_9606	URS000075D95B_9606	F	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
URS000075D95B_9606	URS000075D95B_9606	F	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
URS000075D95B_9606	URS000075D95B_9606	P	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
URS000075D95B_9606	URS000075D95B_9606	C	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
URS000075D95B_9606	URS000075D95B_9606	C	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
URS000075D95B_9606	URS000075D95B_9606	C	Homo sapiens (human) X inactive specific transcript (XIST)		gene_product
  1. The RNA type should be preserved
  2. We should have a specific QC check on RNCA that anything with an RNCA ID must be an RNA subtype

Aside for @alexsign should probably be it's own ticket:

Why don't we get gene symbols for RNA types? This one (Xist) clearly has one https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:12810 - why don't we just propagate across from HGNC?

And not to overstuff this issue but there are issues with general RNCA/HGNC propagation on AGR. Recall AGR uses HGNCs: https://www.alliancegenome.org/gene/HGNC:12810 no GO annotatuion

Even though this gene obviously has a known function: https://amigo.geneontology.org/amigo/gene_product/RNAcentral:URS000075D95B_9606

cmungall avatar Jan 29 '24 16:01 cmungall