CEVOpen copied to clipboard
đź“• Documentation: Documentation: Dictionary.xml and DictionaryDescription.md of: eoActivity
Here we describe the process of creating a [DictionaryName]DictionaryDescription.md document, within which we will describe the contents of the individual dictionary (named in the title of this Issue), which was created (or is in the process of being created) from data collected for Oil186.
I will begin this thread by pasting the contents of the INDEX description, then follwed by first draft copy below for discussion and direction.
 EO Activities
- Description: A dictionary of **the names of 438 essential oil or constituent compound biochemical and/or biological activities, 340 of which resolved to wikidata IDs, and 336 with short descriptions.
- Filename: activity.xml
- File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/activity/activity.xml
Activity​​ Dictionary
A dictionary of 184 activities mentioned in the 186 test articles downloaded from PubMed.
File Data
Filename: activity.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/activity/activity.xml
Table Column Headings
title: type of data to be normalized and tagged with Wikidata ID.
desc: data source
id: CM.activities.n where n is a serialized number
name: The name is a human readable string describing the concept.
term: The term is the precise string used to identify the concept. Name and Term are often the same.
wikidata: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.
No. of source papers: 186
No. of Entries (Headers are not counted): 184
No. of unique compound names (including alternate spellings or synonyms): 184
No. of Chemical Compounds resolved in Wikidata: 74
No. of Chemical Compounds NOT resolved in Wikidata: 110
No source papers are listed. Should we assume 186, or delete that from Contents/Results?
We need to normalize the headings across all Dictionaries
This is the third case where the column heading “description” means something other than "data source / method of input"
In this case, is the column heading “id” related to Essoil? I don’t know how to describe it here. The format is: CM.activities.n where n is a serialized number
I don’t know how to describe the column headings for “Wikipedia” here
@petermr Currently working on cleaning the activities.xml dictionary.
Searching Wikidata for “antiacne” I found this entry:
https://www.wikidata.org/wiki/Q143139 "therapeutic subgroup of the Anatomical Therapeutic Chemical Classification System: Anti-acne preparations”
which led me to search and find this:
https://www.wikidata.org/wiki/Q192093 "classification of active ingredients of drugs according to the organ or system on which they act and their therapeutic, pharmacological and chemical properties.”
and this: https://en.wikipedia.org/wiki/Anatomical_Therapeutic_Chemical_Classification_System
In the absence of a wikidata ID for "antiacne", should I... a) use no id at all b) use https://www.wikidata.org/wiki/Q143139 c) use the ID for "acne" and let users put 2 and 2 together about the "anti-" part?
should we be adding the Anatomical_Therapeutic_Chemical_Classification_System’s IDs to the activities dictionary as well as wikidata? https://www.whocc.no/atc_ddd_index/
Incidentally, the WHO Collaborating Centre for Drug Statistics Methodology also has useful ways to express the following, which may be useful as dictionaries as well.
g | = gram |
mg | = milligram |
mcg | = microgram |
U | = unit |
TU | = thousand units |
MU | = million units |
mmol | = millimole |
ml | = milliliter (e.g. eyedrops) |
Route of administration (Adm.R)
Implant | = Implant |
Inhal | = Inhalation |
Instill | = Instillation |
N | = nasal |
O | = oral |
P | = parenteral |
R | = rectal |
SL | = sublingual/buccal/oromucosal |
TD | = transdermal |
V | = vaginal |
Thanks, Yes, I know about ATC, Our strategy should be to index about Wikidata first and only secondarily against any others. Wikidata should be a gateway. I doubt there are important things we can't do with Wikidata. We've agreed to use GBIF for plants not in Wikidata as they are so varied. The simple answer is probably to add antiacne to Wikidata and include the ATC Id. P.
On Tue, Feb 11, 2020 at 7:53 PM Emanuel Faria [email protected] wrote:
@petermr https://github.com/petermr Currently working on cleaning the activities.xml dictionary.
Searching Wikidata for “antiacne” I found this entry:
https://www.wikidata.org/wiki/Q143139 "therapeutic subgroup of the Anatomical Therapeutic Chemical Classification System: Anti-acne preparations”
which led me to search and find this:
https://www.wikidata.org/wiki/Q192093 "classification of active ingredients of drugs according to the organ or system on which they act and their therapeutic, pharmacological and chemical properties.”
and this: https://en.wikipedia.org/wiki/Anatomical_Therapeutic_Chemical_Classification_System
In the absence of a wikidata ID for "antiacne", should I... a) use no id at all b) use https://www.wikidata.org/wiki/Q143139 c) use the ID for "acne" and let users put 2 and 2 together about the "anti-" part? 2.
should we be adding the Anatomical_Therapeutic_Chemical_Classification_System’s IDs to the activities dictionary as well as wikidata?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/77?email_source=notifications&email_token=AAFTCS5SG2JCWXNDP2IR3MLRCL623A5CNFSM4KMMAB32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELN2NTY#issuecomment-584820431, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS65UGRQBDOXZT3STZLRCL623ANCNFSM4KMMAB3Q .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Ok, I will add new entries as I go. If too time-consuming, I’ll swing back and do it after the dictionaries are cleaned, and then update them accordingly
Sent with GitHawk
I have just finished uploading the cleaned, disambiguated and Wikidata attributed activities dictionary, and updated it's description, as well as the master INDEX of descriptions.
Description: A dictionary of **the names of 438 essential oil or constituent compound biochemical and/or biological activities, 340 of which resolved to wikidata IDs, and 336 with short descriptions.
Filename: activity.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/activity/activity.xml
activity.xml and ActivityDictionaryDescription.md are now updated and working.
I have also updated master INDEXofOIL186Dictionaries.md
As of today, I believe this dictionary and it's description document are complete. Below I will copy the contents of the description document:
EO Activity​​ Dictionary
File Data
Description: A dictionary of 438 essential oil or constituent compound biochemical and/or biological activities, 340 of which resolved to wikidata IDs, and 336 with descriptions of 250 characters or less.
Filename: eoActivity.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoActivity/eoActivity.xml
Table Column Headings
id: serialized identification number
term: The name is a human readable string describing the concept.
wikidataID: Unique identifier linked to Wikidata.org — a free and open knowledge base that can be read and edited by both humans and machines.
description: short description of the activity sourced from wikidata and/or wikipedia
No. of source papers: 186
No. of entries (Headers are not counted): 438
No. of unique activity names (including alternate spellings or synonyms): 438
No. of activities resolved in wikidata (including alternate spellings or synonyms): 340
Number of unique wikidata ids attributed to activities (normalizing for alternate spellings and synonyms): 250
No. of entries withoug wikidataid: 98
No. of entries with descriptions: 336
No. of entries without descriptions: 102
- Â