CEVOpen copied to clipboard
đź“š Documentation: MASTER INDEX of Dictionary Descriptions for Oil186 test batch
Here we describe the process of:
- creating a master INDEX (INDEXofOIL186Dictionaries.md)of [DictionaryName]DictionaryDescription.md documents, which will describe the contents of the individual dictionaries created to date for data collected for Oil186,
- creating individual "DictionaryDescription" documents for each dictionary — which will each have their own Github Issue number, to facilitate discussion and correction.
I started the task of creating individual Dictionary Description Documentation (“DDD”) for each by the following steps:
Since there were a lot of .tsv and .csv files in (A), I first created a new directory in https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw … called “DictionaryDuplicateTablesOrganized”
Copied duplicates of them within the following sub-directories: https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw/DictionaryDuplicateTablesOrganized
- Examined the contents of each file in each (now sorted) directory and — hopefully — picked the right ones to begin drafting DDDs for each — along with an “AboutOIL186Dictionaries.md” master description document — all of which can be found here: https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw/DictionaryDescriptionsOIL186
- AboutOIL186Dictionaries.md
- ChemicalConstituentsDictionaryDescription.md
- PlantOriginDescription.md
- ExtractionAndChemicalAnalysisMethodsDictionaryDescription.md
- TargetOrganismDictionaryDescription.md
I will provide further details as updates are made.
Clarification requested:
@petermr Should I actually be writing up Dictionary Descriptions as requested for the items in here: A) https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw … or for these ones I us found here: B) https://github.com/petermr/CEVOpen/tree/master/dictionary ?
Direction requested:
Please have a look and provide feedback as to:
- Have I chosen the correct tables to describe? If not, please point me to the right ones. (eg. the Chemical Constituents file I chose was the only one with wikidataIDs, but had very few entries).
- What you want me to name the files (Please check the names of the Dictionary themselves — titles inside the dictionary documents, their names in the "AboutOIL186Dictionaries.md" file, and the .md file names themselves.
- Where you want them to be posted
- Would you like any changes to the formatting?
Please ### note:
- The source files for the descriptions are in the .md documents
- I have pasted questions for you at the bottom of some of them.
Thank you.
On Fri, Jan 24, 2020 at 3:43 PM Emanuel Faria [email protected] wrote:
I started the task of creating individual Dictionary Description Documentation (“DDD”) for each by the following steps:
Since there were a lot of .tsv and .csv files in (A), I first created a new directory in https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw … called “DictionaryDuplicateTablesOrganized”
Yes, there was no system in naming the files so there are almost certainly duplicates. Important to try to identify the latest one.
Copied duplicates of them within the following sub-directories:
Looks appropriate.
- Examined the contents of each file in each (now sorted) directory and — hopefully — picked the right ones to begin drafting DDDs for each — along with an “AboutOIL186Dictionaries.md” master description document — all of which can be found here: https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw/DictionaryDescriptionsOIL186
- AboutOIL186Dictionaries.md
- ChemicalConstituentsDictionaryDescription.md
- CountryDictionaryDescription.md
- ExtractionAndChemicalAnalysisMethodsDictionaryDescription.md
- TargetOrganismDictionaryDescription.md
I will provide further details as updates are made.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/74?email_source=notifications&email_token=AAFTCS24FR27KDDFPPX72ZDQ7MEANA5CNFSM4KLHT7W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ3GCQQ#issuecomment-578183490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYEYBJDJLLFLRXMQQTQ7MEANANCNFSM4KLHT7WQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
On Fri, Jan 24, 2020 at 3:49 PM Emanuel Faria [email protected] wrote:
Clarification requested:
@petermr https://github.com/petermr Should I actually be writing up Dictionary Descriptions as requested for the items in here: A) https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw … or for these ones I us found here: B) https://github.com/petermr/CEVOpen/tree/master/dictionary ?
Note that "tree/master/" chunk is an artefact of Github and won't appear on your disk
B) is the production version, but you should check if there is an obviously larger or newer/cleaner version in A);
A dictionary has a ist of entries like:
... each entry MUST have a term and SHOULD have a wikidata ID. It MAY have a name (which is often the same as the term, but not always). Ideally they should all have IDs. The description is normally the Wikidata description
Direction requested:
Please have a look and provide feedback as to:
- Have I chosen the correct tables to describe? If not, please point me to the right ones. (eg. the Chemical Constituents file I chose was the only one with wikidataIDs, but had very few entries).
The dictionaries should end up in https://github.com/petermr/CEVOpen/[tree/master/]dictionary https://github.com/petermr/CEVOpen/tree/master/dictionary
- What you want me to name the files
for the dictionary the name of title in the file , e.g. CEVOpen https://github.com/petermr/CEVOpen/dictionary https://github.com/petermr/CEVOpen/tree/master/dictionary/targetOrganism https://github.com/petermr/CEVOpen/tree/master/dictionary/targetOrganism/ targetOrganism.xml starts
The "targetOrganism" is the name of the file (+.xml) and also the title of the dictionary. If they are different the software wi;; throw an error.
- Where you want them to be posted
- Would you like any changes to the formatting?
Please ### note:
- The source files for the descriptions are in the .md documents
Please put the links in the issue so I can go straight there...
- I have pasted questions for you at the bottom of some of them.
Thank you.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/74?email_source=notifications&email_token=AAFTCSZT54PF7CQNMYXQBVDQ7MEY7A5CNFSM4KLHT7W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ3GYGI#issuecomment-578186265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4EJU2MEHZE24FHVATQ7MEY7ANCNFSM4KLHT7WQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Working on description document for compounds.xml (Draft of CompoundDictionaryDescription.md in the same folder now. It was made with the texts.app I told you about, @petermr ... look ok to you?).
What are the definitions for the following, please:
/desc /entry/@name /entry/@term
Is there information missing from this mail?
On Fri, Jan 24, 2020 at 11:18 PM Emanuel Faria [email protected] wrote:
What are the definitions for the following, please:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/74?email_source=notifications&email_token=AAFTCS3ZVFVM577TVDHVOG3Q7NZLJA5CNFSM4KLHT7W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ4MF4Y#issuecomment-578339571, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6OQOBACPO26NZBWNDQ7NZLJANCNFSM4KLHT7WQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Whoops. Yes.... added links to files below.
Working on description document for compounds.xml For the draft of CompoundDictionaryDescription.md in the same folder now. (It was made with the free WYSIWYG markdown editor I told you about, @petermr ... look ok to you?).
I don't know how to distinguish/describe the definitions for the following column headings. Can you help with that?
/entry/@name /entry/@term
The term is the precise string used to identify the concept. The name is a human readable string describing the concept .they are often the same.
On Sat, 25 Jan 2020, 15:58 Emanuel Faria, [email protected] wrote:
Whoops. Yes.... added links to files below.
Working on description document for compounds.xml https://github.com/petermr/CEVOpen/blob/master/dictionary/compound/compound.xml For the draft of CompoundDictionaryDescription.md https://github.com/petermr/CEVOpen/blob/master/dictionary/compound/CompoundDictionaryDescription.md in the same folder now. (It was made with the free WYSIWYG markdown editor http://www.texts.io/ I told you about, @petermr https://github.com/petermr ... look ok to you?).
I don't know how to distinguish/describe the definitions for the following column headings. Can you help with that?
/entry/@name https://github.com/name /entry/@term https://github.com/term
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/74?email_source=notifications&email_token=AAFTCS3LZ3FKKVIODM4GD4LQ7ROTZA5CNFSM4KLHT7W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ47JRA#issuecomment-578417860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZNJQ6CWKCHXX253ODQ7ROTZANCNFSM4KLHT7WQ .
Thanks Peter. Compound Dictionary description is now ready for review. https://github.com/petermr/CEVOpen/blob/master/dictionary/compound/CompoundDictionaryDescription.md
Interestingly, it contains a table of contents at the top of the page, which I did not create. Does github do this by default, or was it the WYSIWYG editor I'm using?
I've just posted drafts DictionaryDescriptions for the dictionary .xml files I could find.
Location of Main Description of Descriptions .md The main document that provides a description of all the DictionaryDiscriptions is AboutOIL186Dictionaries.md. From here, you can click on the name of any of the sub-sub-headings that end with .md to get to the individual DictionaryDescription for that topic.
Location of Individual Descriptions ### files Because the there were two sources of .xml files to work with (either in CEVOpen/tree/master/dictionary or CEVOpen/tree/master/articleAnalysis/oil186/raw) I have stored the individual DictionaryDescription .md files accordingly in:
- https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw/DictionaryDuplicateTablesOrganized or
- https://github.com/petermr/CEVOpen/tree/master/dictionary
(Remember: I created the directory /DictionaryDuplicateTablesOrganized and copied the existing files in https://github.com/petermr/CEVOpen/tree/master/articleAnalysis/oil186/raw/ in order to better organize them for my work on creating these dictionaries.)
Heads up Currently, there are notes at the bottom of each of the individual dictionaries — things to fix, clean up, consider, decide, etc.. I will now begin coping the contents of each of them — including their notes — in into separate comment entries for discussion and instruction for correction.
EDIT: On second thought... I'll paste the contents of the master description of descriptions below, and begin new issues for the individual ones. It will be easier to manage the conversation about corrections that way.
[Index of​ the OIL186 Dictionaries](https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/DictionaryDescriptionsOIL186/INDEXofOIL186Dictionaries.md )
This document contains information about the Manually Created Dictionaries for OIL186.
The purpose/function of Dictionaries:
*Identify objects/concepts (eg. “e.coli" is a concept.). *
Give each object clear lexical names by which they can be searched. (An object that goes by more than one name is a synonym)
Give each object a link to wikidata (or other authorities) by which we can learn more about them.
PLEASE NOTE: Rather than alphabetical order, are listed here in the logical progression from Plants -> Extracts -> Testing Methods and Instruments -> Results Analysis -> Activities -> Target Organisms the activities were tested upon -> Diseases related to those target organisms
Layman and Botanical Names / Species
Description: A dictionary of 1678 constituent chemical compounds extracted from Essential Oils mentioned in the 186 test articles downloaded from PubMed. Of the 1678 entries, ?????? had their names normalized and tagged with corresponding Wikidata IDs, the other 112 remain to be resolved.
Filename: OilPlant.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/plant/oilplant.xml
Plant Parts
The plant part or parts from which the mentioned oils are extracted
Description: A dictionary of [XX] part(s) of a plant from which Essential Oils — mentioned in the 186 test articles downloaded from PubMed — were extracted.
Filename: plantParts20191014.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/plantparts/raw/plantParts20191014.xml
The geographical origins of the harvested plant material
Description: A dictionary of 46 countries of origin mentioned in the 186 source articles for plants being tested.
Filename: country20191222.tsv
File Location: https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/country20191222.tsv
Plant Material History
Description: A dictionary of [XX] plant processes from which Essential Oils — mentioned in the 186 test articles downloaded from PubMed — were harvested.
Filename: process20191014.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/process/process20191014.xml
EO Extraction and Chemical Analysis Methods
Equipment, methods and materials used for EO extraction
Description: A dictionary of 6 Methods of Essential Oil extraction and 6 Types of Chemical Analysis, mentioned in the 186 source articles for plant extracts being tested.
Filename:Â methodAndAnalysisExtraction20191225.tsv
File Location:Â https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/methodAndAnalysisExtraction20191225.tsv
EO Analysis Instruments
A dictionary of [24] makes/models of Gas chromatography–mass spectrometry equipment used to identify different substances within a test sample — in this case, Essential Oils mentioned in the 186 test articles downloaded from PubMed.
Description: A dictionary of [24] makes/models of Gas chromatography–mass spectrometry equipment used to identify different substances within a test sample — in this case, Essential Oils mentioned in the 186 test articles downloaded from PubMed.
Filename:Â instrument.xml
File Location:Â https://github.com/petermr/CEVOpen/blob/master/dictionary/instrument/raw/instrument.xml
EO Chemical Analysis Results - Constituents and Concentrations
Essential Oils (EOs) are the concentrated hydrophobic liquid containing volatile chemical compounds extracted from plants. Essential oils are also known as volatile oils, ethereal oils, aetherolea, or simply as the oil of the plant from which they were extracted, such as oil of clove.
Qualitative (constituent compounds) and quantitative (%) analysis of the chemical composition of the tested Essential Oils (Extracts?), with each known compound linked to its IUPAC International Chemical Identifier (InChI).
Description: A dictionary of 2114 constituent chemical compounds extracted from Essential Oils mentioned in the 186 test articles downloaded from PubMed. Of the 2114 entries, 1010 had their names normalized and tagged with corresponding Wikidata IDs, the other 1104 remain to be resolved.
Filename: compound.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/compound/compound.xml
EO Activities
Tested biochemical and/or biological activities, and where available, their measured results.
Description: A dictionary of 184 activities mentioned in the 186 test articles downloaded from PubMed.
Filename: activity.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/activity/activity.xml
Target Organisms
The organisms used as targets of experiments conducted to determine what effect(s) (Activities) tested EOs may have on them. They may occur as A) single-cells or colonies, such as bacteria, fungi, yeasts and molds, protozoa, algae, or viruses; B) insects such as mosquitos, flies, etc.; or, C) they may be helminths, such as Nematodes (roundworms), Cestodes (tapeworms), and Trematodes (flukes).
Description: A dictionary of [55] organisms mentioned [as subjects of experiment?] in the 186 test articles downloaded from PubMed.
Filename:Â targetOrganism.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/targetOrganism/targetOrganism.xml
Description: A dictionary of 133 microrganisms mentioned in tests + WikidataID + frequencies (the number of times the organisms occurred in the 186 source papers)
Filename:Â TargetOrganismCount.csv
File Location:Â https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/targetOrganismCount.csv
Text for definitions goes here
This dictionary does not yet exist
FYI: As I clean up each [dictionary].xml file and update their unique [DictionaryName]DictionaryDescription.md files, I have also updated the master INDEX of Oil186 Dictionary Descriptions here: (INDEXofOIL186Dictionaries.md)
As of today, we have 11 finished dictionaries. They are:
- eoActivity
- eoAnalysisMethod
- eoCompound
- eoExtractionMethod
- eoPlant
- eoPlantMaterialHistory
- eoPlantPart
- eoTargetOrganism
- geoLocation
- humanDiseases
- pests
... as well as a master INDEX of their descriptions, pasted below:
Index Oil186 Dictionaries
This index contains information about the Manually Created Dictionaries for OIL186.
PLEASE NOTE: Rather than alphabetical order, are listed here in the logical progression.
The purpose/function of Dictionaries:
Identify “things” as objects or concepts (eg. “e.coli" is a concept.).
Give each object clear lexical names by which they can be searched.
(An object that goes by more than one name is a synonym.) -
Give each object a link to wikidata (or other authorities) by which we can learn more about them.
EO Plant
Description: A dictionary of 1678 plant names extracted mentioned in the 186 test articles downloaded from PubMed. Of the 1678 entries, 1567 had their names normalized and tagged with corresponding Wikidata IDs.
Filename: eoPlant.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlant/eoPlant.xml
EO Plant Part
Description: A dictionary of 285 plant part terms.
Filename: eoPlantPart.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantPart/eoPlantPart.xml
Geo Location
Description: A dictionary of 9568 entries for geolocations including country, countryISOcode, city, latitude, longitude, postal code and time zone sourced from http://www.ip2location.com, along with data agumenting Indian States-Cities created and maintained over the years obtained at https://network.convergenceservices.in/forum/12-joomla-development/4305-mysql-tables-for-country-states-and-indian-states-cities.html.
License information: This site or product includes IP2Location LITE data available from http://www.ip2location.com
Filename: geoLocation.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/geoLocation/geoLocation.xml
EO Plant Material History
Description: A dictionary of 81 entries relating to the plant material history leading up to the extraction of Essential Oils mentioned in selected literature chosen from the 186 test articles downloaded from PubMed. The entries include key words and phrases describing: growth conditions, plant life stages, plant material selection, post-harvest treatment methods, and extracted plant material products. Of the 82 entries, 58 were resolved to WikidataIDs.
Filename: eoPlantMaterialHistory.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoPlantMaterialHistory/eoPlantMaterialHistory.xml
EO Extraction Method
Description: A dictionary of 87 terms for Essential Oil extraction methods and apparatus.
Filename: eoExtractionMethod.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoExtractionMethod/eoExtractionMethod.xml
EO​​ Analysis Method
Analytical chemistry studies and uses instruments and methods used to separate, identify, and quantify matter.[1] In practice, separation, identification or quantification may constitute the entire analysis or be combined with another method. Separation isolates analytes. Qualitative analysis identifies analytes, while quantitative analysis determines the numerical amount or concentration.
Analytical chemistry consists of classical, wet chemical methods and modern, instrumental methods.[2] Classical qualitative methods use separations such as precipitation, extraction, and distillation. Identification may be based on differences in color, odor, melting point, boiling point, radioactivity or reactivity. Classical quantitative analysis uses mass or volume changes to quantify amount. Instrumental methods may be used to separate samples using chromatography, electrophoresis or field flow fractionation. Then qualitative and quantitative analysis can be performed, often with the same instrument and may use light interaction, heat interaction, electric fields or magnetic fields. Often the same instrument can separate, identify and quantify an analyte.
(Source: https://en.wikipedia.org/wiki/Analytical_chemistry)
Description: A dictionary of 117 entries describing instruments and methods used to separate, identify, and quantify matter — 105 being resolved to wikidata IDs, and 95 with short descriptions.
Filename: eoAnalysisMethod.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoAnalysisMethod/eoAnalysisMethod.xml
EO Compound
Essential Oils (EOs) are the concentrated hydrophobic liquid containing volatile chemical compounds extracted from plants. Essential oils are also known as volatile oils, ethereal oils, aetherolea, or simply as the oil of the plant from which they were extracted, such as oil of clove.
Qualitative (constituent compounds) and quantitative (%) analysis of the chemical composition of the tested Essential Oils (Extracts?), with each known compound linked to its IUPAC International Chemical Identifier (InChI).
Description: A dictionary of 2114 constituent chemical compounds extracted from Essential Oils converted from essoldb1.0 data. Of the 2114 entries, 1010 had their names normalized and tagged with corresponding Wikidata IDs, the other 1104 remain to be resolved as no Wikidata IDs currently exist for them.
Filename: eoCompound.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoCompound/eoCompound.xml
EO Activity
Description: A dictionary of 438 essential oil or constituent compound biochemical and/or biological activities, 340 of which resolved to wikidata IDs, and 336 with descriptions of 250 characters or less.
Filename: eoActivity.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoActivity/eoActivity.xml
EO Target Organism
The organisms used as targets of experiments conducted to determine what effect(s) (Activities) tested EOs may have on them. They may occur as A) single-cells or colonies, such as bacteria, fungi, yeasts and molds, protozoa, algae, or viruses; B) insects such as mosquitos, flies, etc.; or, C) they may be helminths, such as Nematodes (roundworms), Cestodes (tapeworms), and Trematodes (flukes).
Description: A dictionary of terms describing 307 target organisms resolved to wikidataIDs (including genus and species of bacteria, fungi, protist, protozoa, and other microorgnisms), with 154 terms including names of related diseases.
Filename: eoTargetOrganism.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/eoTargetOrganism/eoTargetOrganism.xml
Human Diseases
Description: A dictionary of 3412 terms related to human diseases.
Filename: humanDiseases.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/humanDiseases/humanDiseases.xml
Description: A dictionary of 1032 terms for two categories of insects: A) Insect vectors of human pathogens sourced from https://en.wikipedia.org/wiki/Category:Insect_vectors_of_human_pathogens, and B) Winged insects soursed from https://www.insectidentification.org/winged-insect-key.asp
Filename: pests.xml
File Location: https://github.com/petermr/CEVOpen/blob/master/dictionary/pests/pests.xml