go-site
go-site copied to clipboard
RNAC RNA types are getting mangled by the pipeline (tested by gorule-0000001)
Source:
✗ curl -L -s https://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human_rna.gaf.gz | gzip -dc | grep URS000075D95B_9606 | cut -f2,3,9-12
URS00004176D4_9606 URS00004176D4_9606 F Homo sapiens (human) hsa-miR-185-5p miRNA
URS000075D95B_9606 URS000075D95B_9606 F Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 F Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 F Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 C Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 C Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
URS000075D95B_9606 URS000075D95B_9606 C Homo sapiens (human) X inactive specific transcript (XIST) lncRNA
what we end up publishing:
✗ curl -L -s http://current.geneontology.org/annotations/goa_human_rna.gaf.gz | gzip -dc | grep URS000075D95B_9606 | cut -f2,3,9-12
URS00004176D4_9606 URS00004176D4_9606 F Homo sapiens (human) hsa-miR-185-5p miRNA
URS000075D95B_9606 URS000075D95B_9606 F Homo sapiens (human) X inactive specific transcript (XIST) gene_product
URS000075D95B_9606 URS000075D95B_9606 F Homo sapiens (human) X inactive specific transcript (XIST) gene_product
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) gene_product
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) gene_product
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) gene_product
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) gene_product
URS000075D95B_9606 URS000075D95B_9606 P Homo sapiens (human) X inactive specific transcript (XIST) gene_product
URS000075D95B_9606 URS000075D95B_9606 C Homo sapiens (human) X inactive specific transcript (XIST) gene_product
URS000075D95B_9606 URS000075D95B_9606 C Homo sapiens (human) X inactive specific transcript (XIST) gene_product
URS000075D95B_9606 URS000075D95B_9606 C Homo sapiens (human) X inactive specific transcript (XIST) gene_product
- The RNA type should be preserved
- We should have a specific QC check on RNCA that anything with an RNCA ID must be an RNA subtype
Aside for @alexsign should probably be it's own ticket:
Why don't we get gene symbols for RNA types? This one (Xist) clearly has one https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:12810 - why don't we just propagate across from HGNC?
And not to overstuff this issue but there are issues with general RNCA/HGNC propagation on AGR. Recall AGR uses HGNCs: https://www.alliancegenome.org/gene/HGNC:12810 no GO annotatuion
Even though this gene obviously has a known function: https://amigo.geneontology.org/amigo/gene_product/RNAcentral:URS000075D95B_9606