RTX-KG2 icon indicating copy to clipboard operation
RTX-KG2 copied to clipboard

Single Exon node with the name `Exon`

Open dkoslicki opened this issue 1 year ago • 3 comments

I might have mentioned it before, but there is only a single node with the category biolink:Exon: a node with the name Exon. I think either the ETL-ing of whatever KP has exon info is borked, or something else fishy might be going on. Otherwise, should this node (and the category) just be removed?

dkoslicki avatar Feb 19 '24 18:02 dkoslicki

This is the single biolink:Exon node in KG2 (checked in RTX-KG2.9.0pre):

{
  "iri": "http://www.ebi.ac.uk/efo/EFO_0004423",
  "synonym": [
    "exonic region"
  ],
  "category_label": "exon",
  "deprecated": "False",
  "name": "exon",
  "description": "An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA (introns) have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing.",
  "provided_by": "['infores:efo']",
  "id": "EFO:0004423",
  "category": "biolink:Exon",
  "update_date": "3630"
}

This node comes from EFO, which is in the multi ont load process. I would not be surprised if that ETL is "borked". I will take a look to see where this is coming from.

ecwood avatar Jun 26 '24 20:06 ecwood

Here is the term in efo.owl:

    <!-- http://www.ebi.ac.uk/efo/EFO_0004423 -->

    <owl:Class rdf:about="http://www.ebi.ac.uk/efo/EFO_0004423">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/BFO_0000040"/>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
                <owl:someValuesFrom rdf:resource="http://www.ebi.ac.uk/efo/EFO_0004422"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <obo:IAO_0000115>An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA (introns) have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing.</obo:IAO_0000115>
        <oboInOwl:hasDbXref>NCIt:C13231</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref>SNOMEDCT:33091005</oboInOwl:hasDbXref>
        <oboInOwl:hasExactSynonym>exonic region</oboInOwl:hasExactSynonym>
        <rdfs:label>exon</rdfs:label>
    </owl:Class>

ecwood avatar Jun 26 '24 22:06 ecwood

EFO:0004423 is a subclass of material entity (BFO:0000040), along with several other similar terms. It looks like the same issue also shows up with a different subclass of material entity like enzyme:

{
  "iri": "http://purl.obolibrary.org/obo/OBI_0000427",
  "category_label": "protein",
  "deprecated": "False",
  "name": "enzyme",
  "description": "(protein or rna) or has_part (protein or rna) and has_function some GO:0003824 (catalytic activity); (protein or rna) or has_part (protein or rna) and has_function some GO:0003824 (catalytic activity)",
  "provided_by": "['infores:efo', 'infores:genepio']",
  "id": "OBI:0000427",
  "category": "biolink:Protein",
  "update_date": "2024-02-21 01:39:56 GMT"
}

These are all of the subclasses of material entity: image

Running

match (n) where n.iri in ["http://purl.obolibrary.org/obo/BTO_0002690", "http://www.ebi.ac.uk/efo/EFO_0004446", "http://purl.obolibrary.org/obo/BTO_0000214", "http://www.ebi.ac.uk/efo/EFO_0000324", "http://purl.obolibrary.org/obo/GO_0005575", "http://www.ebi.ac.uk/efo/EFO_0006794", "http://purl.obolibrary.org/obo/CHEBI_24431", "http://www.ebi.ac.uk/efo/EFO_0005066", "http://www.ebi.ac.uk/efo/EFO_0000469", "http://purl.obolibrary.org/obo/OBI_0000427", "http://www.ebi.ac.uk/efo/EFO_0004422", "http://www.ebi.ac.uk/efo/EFO_0004423", "http://purl.obolibrary.org/obo/SO_0000704", "http://www.ebi.ac.uk/efo/EFO_0004420", "http://www.ebi.ac.uk/efo/EFO_0000548", "http://www.ebi.ac.uk/efo/EFO_0005060", "http://purl.obolibrary.org/obo/OBI_0100026", "http://www.ebi.ac.uk/efo/EFO_0000635", "http://purl.obolibrary.org/obo/OBI_0000245", "http://purl.obolibrary.org/obo/MPATH_0", "http://www.ebi.ac.uk/efo/EFO_0000663", "http://purl.obolibrary.org/obo/OBI_0000181", "http://www.ebi.ac.uk/efo/EFO_0010579", "http://purl.obolibrary.org/obo/OBI_0100051", "http://www.ebi.ac.uk/efo/EFO_0004359", "http://purl.obolibrary.org/obo/BTO_0001384", "http://purl.obolibrary.org/obo/OBI_0100051"] return n.id, n.name, n.category, n.provided_by

on kg2endpoint-kg2-9-0.rtx.ai we get:

n.id n.name n.category n.provided_by
"GO:0005575" "cellular_component" "biolink:CellularComponent" "['infores:efo', 'infores:cl', 'infores:go-plus', 'infores:hpo', 'infores:mondo', 'infores:nbo', 'infores:pato', 'infores:pr', 'infores:uberon', 'infores:go']"
"CHEBI:24431" "chemical entity" "biolink:MolecularEntity" "['infores:efo', 'infores:chebi', 'infores:cl', 'infores:disease-ontology', 'infores:foodon', 'infores:genepio', 'infores:go-plus', 'infores:hpo', 'infores:mondo', 'infores:nbo', 'infores:pato', 'infores:pr', 'infores:uberon']"
"OBI:0100026" "organism" "biolink:PhysicalEntity" "['infores:efo', 'infores:foodon', 'infores:genepio', 'infores:go-plus', 'infores:pato', 'infores:pr', 'infores:ro']"
"SO:0000704" "gene" "biolink:Gene" "['infores:efo', 'infores:disease-ontology', 'infores:go-plus', 'infores:mondo', 'infores:pr', 'infores:uberon']"
"OBI:0100051" "specimen" "biolink:PhysicalEntity" "['infores:efo', 'infores:genepio']"
"EFO:0006794" "cerebrospinal fluid biomarker measurement" "biolink:InformationContentEntity" "['infores:efo']"
"EFO:0000635" "organism part" "biolink:AnatomicalEntity" "['infores:efo']"
"EFO:0000663" "pool" "biolink:PhysicalEntity" "['infores:efo']"
"EFO:0005060" "instrument part" "biolink:PhysicalEntity" "['infores:efo']"
"EFO:0005066" "collection of material" "biolink:MaterialSample" "['infores:efo']"
"BTO:0000214" "cell culture" "biolink:PhysicalEntity" "['infores:efo']"
"EFO:0004423" "exon" "biolink:Exon" "['infores:efo']"
"EFO:0004422" "exome" "biolink:PhysicalEntity" "['infores:efo']"
"EFO:0004420" "genome" "biolink:PhysicalEntity" "['infores:efo']"
"EFO:0004446" "biological macromolecule" "biolink:MolecularEntity" "['infores:efo']"
"EFO:0000324" "cell type" "biolink:Cell" "['infores:efo']"
"EFO:0000548" "instrument" "biolink:PhysicalEntity" "['infores:efo']"
"EFO:0000469" "environmental factor" "biolink:PhysicalEntity" "['infores:efo']"
"EFO:0010579" "proteome" "biolink:PhysicalEntity" "['infores:efo']"
"OBI:0000245" "organization" "biolink:PhysicalEntity" "['infores:efo', 'infores:foodon', 'infores:genepio']"
"MPATH:0" "pathological entity" "biolink:BiologicalEntity" "['infores:efo', 'infores:genepio', 'infores:hpo']"
"OBI:0000427" "enzyme" "biolink:Protein" "['infores:efo', 'infores:genepio']"
"BTO:0001384" "tissue culture" "biolink:PhysicalEntity" "['infores:efo']"
"EFO:0004359" "telomere" "biolink:PhysicalEntity" "['infores:efo']"
"OBI:0000181" "population" "biolink:PhysicalEntity" "['infores:efo', 'infores:genepio']"
"BTO:0002690" "biofilm" "biolink:PhysicalEntity" "['infores:efo']"

Many of these seem to be problematic.

ecwood avatar Jun 26 '24 22:06 ecwood