Anders Riutta
Anders Riutta
@kevinxin90, we're looking to add more fields (PMID, Pathway Ontology, Disease Ontology) to our data for BTE. What do you think about the format below (the values are just placeholders)?...
@kevinxin90, in this reporting period, @AlexanderPico and I have done more work to parse our PFOCR data, using APIs like [PubTator](https://www.ncbi.nlm.nih.gov/research/pubtator/api.html) to extract chemical and disease mentions in the OCR...
@andrewsu, you're also welcome to comment on this format, if you're interested, especially how to distinguish mentions vs. annotations.
@kevinxin90, that looks great. I just need to get you an updated file with this format.
Hi @kevinxin90, I wanted to let you know I have the chemicals extracted from almost all the PFOCR pathway figures (63591 of 64643 because the PubTator API returned an error...
@kevinxin90, Here is an updated file in the format we discussed, including chemicals in addition to genes this time: https://www.dropbox.com/s/m03hd447oi3yjz1/pfocr_biothings_65k_20201203.ndjson?dl=0 ## Summary Stats for pathway figures having at least 3...
Good catch! Yes, you're right. I'll take a look at filling those in.
Chemicals: yes. Diseases, pathways or amino acids: I'm not sure.
How inclusive should we be for chemicals? For example, some of our collaborators don't want side metabolites included in their analyses, e.g., Na+, NADPH or S-adenosyl-L-methioninate. Thanks to @tokebe's suggestion,...
I've got a draft version of the latest export file ready: https://github.com/wikipathways/pfocr-pipeline/raw/main/export/bte_chemicals_diseases_genes.ndjson Note this version does not have PubMed IDs. Is it important that I provide them, or is this...