dipper
dipper copied to clipboard
Data Ingestion Pipeline for Monarch
some genes in OMIA don't exist in NCBI; they make connections (like orthology links) by linking gene labels. we should make orthology links by some text matching. (which of course...
currently, the implementation leads to patterns like: MGI:5578357 -[GENO:0000382]-> NCBIGene:80184 -[RO:0002162]->NCBITaxon:9606 but probably ought to be MGI:5578357 - [derives_sequence_from] -> NCBIGene:80184(hsCEP290) the refactor should lead to similar stuff to what...
Discussed with @kshefchek see: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0091172 https://www.broadinstitute.org/scientific-community/science/projects/mammals-models/dog/dog-genome-links https://research.nhgri.nih.gov/dog_genome/ need to do more research on sources @jmcmurry
**Source Name:** Rhea http://www.rhea-db.org/ **Source Data Description:** [About](http://www.rhea-db.org/webservice) Rhea comes from a joint ISB and EMBL-EBI effort and annotates biochemical reactions including enzymes, kinetic data and participants (linking to ChEBI)....
I need to debug this, but at present the ingest outputs a file with minimal metadata triples. Looks like it it is failing on an initial call: DEBUG:urllib3.connectionpool:http://mychem.info:80 "POST /v1/drug?ids=RTAQQCXQSZGOHL-UHFFFAOYSA-N%2CWFKWXMTUELFFGS-UHFFFAOYSA-N%2CPXHVJJICTQNCMI-UHFFFAOYSA-N%2CSYQBFIAQOQZEGI-UHFFFAOYSA-N%2CGNPVGFCGXDBREM-UHFFFAOYSA-N%2CBUGBHKTXTAQXES-UHFFFAOYSA-N%2CQCWXUUIWCKQGHC-UHFFFAOYSA-N%2CWATWJIUSRGPENY-UHFFFAOYSA-N%2CLFNLGNPSGWYGGD-UHFFFAOYSA-N%2CTVFDJXOCXUVLDH-UHFFFAOYSA-N&fields=drugbank.targets%2Cdrugbank.drugbank_id%2Cunii.unii%2Cdrugcentral.drug_use%2Cdrugcentral.bioactivity...
How do we handle "phenotype manifest in" data? - Link them to the anatomical entity via 'located in' or 'expressed in' and/or; - Link them to UPheno classes with 'has...
http://web.expasy.org/cellosaurus/description.html However, consider the Creative Commons Attribution-NoDerivs License!!!! We may need to get special permission to include and/or reimplement their sources, related to prelim work we had done before Cellosaurus...
@mbrush hpoa.ttl (exclusively) uses this unresolvable @base iri as a predicate 49,148 times: https://monarchinitiative.org/frequencyOfPhenotype followed by object literals such as: ``` 17,124 "hallmark" . 13,350 "occasional" . 13,249 "typical" ....
we are quite good about describing pre-transformed sources in ttl: http://data.monarchinitiative.org/ttl/ctd_dataset.ttl But this lacks things like a basic textual description of the data, info on what the nature of the...
'Models' are animals or cell line systems that can be used to study a particular condition or disease. An entity is asserted to model a disease using the is_model_of relation...