CEVOpen icon indicating copy to clipboard operation
CEVOpen copied to clipboard

đź“• Documentation: Dictionary.xml and DictionaryDescription.md of: eoActivity

Open EmanuelFaria opened this issue 5 years ago • 32 comments

The Big WHY — Dictionary: Activities | Extended list/table for normalization

We are building the ACTIVITIES DICTIONARY so that:

Type of User: Verriclear Natural Skin Essentials™ can: confidently choose essential oil ingredients that perform HIGHLY SPECIFIC desired phytomedicinal activities optimally, and possess desired chemical properties (like absorption rate, pleasing or neutral fragrance) without: introducing undesirable activities and chemical properties (eg. skin irritants, carciongenic, toxic, etc.,)

Goals: Describe the Challenge, the solution we will bring, and the Desired End State by which all will know we have achieved excellence.

  • A. Deliver a diverse and useful set of activities that will serve as keywords when searching the literature, as well as tags to be associated with plants, essential oils, and their constituents

Desired Results: A clear and concise description / outline of the final "state or vision" of the project — the evidence we will see when our goals are achieved.

  • A. Identify and cross-reference as many specific Activity Classes, Activity Action Types, and Activity Targets as possible from the relevant fields in the provided RAW data table
  • B. Normalize their names and synonyms/aliases
  • C. Add Wikidata or other relevant IDs
  • D. Capture Activity descriptions for each

Guiding principles: What principles will guide our decisions as we do our part to fulfill the mission?

  • A. Review the notes in the column headings as well as the comments related to specific records. If you have questions, ask.

Responsibilities and Roles: Who will have what completed when?

  • A. @mannyrules / Verriclear will provide the RAW expanded list of activities to be cross-referenced and normalized
  • B. @petermr will analyze the RAW data and deliberate with Emanuel and other experts on how best to organize the data
  • C. @ambarishK will perform the cross-referencing and normalization. (Once final version approved, please check with @gita to update EssOilDb entries that had multiple Activities assigned to single entries, as noted in the RAW table).

Tips, Tools, Shortcuts and Resources: Anything done or used to make the desired outcome more likely to occur.

  • A. @ambarishK: Please search for synonyms in this column too. Hopefully, you can widen the scope by changing the suffixes of the words (eg. Carcinogen, Carcinogenic, Carcinogenicity). Also, try with and without hyphens (eg. Anti-Viral and antiviral)

EmanuelFaria avatar Oct 03 '19 03:10 EmanuelFaria

OK sir.

ambarishK avatar Oct 03 '19 09:10 ambarishK

Thanks - this is shaping well

On Thu, Oct 3, 2019 at 4:30 AM Emanuel Faria [email protected] wrote:

The Big WHY

We are building the ACTIVITIES DICTIONARY so that:

Type of User: Verriclear Natural Skin Essentials™ can: confidently choose essential oil ingredients that perform HIGHLY SPECIFIC desired phytomedicinal activities optimally, and possess desired chemical properties (like absorption rate, pleasing or neutral fragrance) without: introducing undesirable activities and chemical properties (eg. skin irritants, carciongenic, toxic, etc.,)

Goals: Describe the Challenge, the solution we will bring, and the Desired End State by which all will know we have achieved excellence.

  • A. Deliver a diverse and useful set of activities that will serve as keywords when searching the literature, as well as tags to be associated with plants, essential oils, and their constituents

Desired Results: A clear and concise description / outline of the final "state or vision" of the project — the evidence we will see when our goals are achieved.

  • A. Identify and cross-reference as many specific Activity Classes, Activity Action Types, and Activity Targets as possible from the relevant fields in the provided RAW data table
  • B. Normalize their names and synonyms/aliases
  • C. Add Wikidata or other relevant IDs
  • D. Capture Activity descriptions for each

Guiding principles: What principles will guide our decisions as we do our part to fulfill the mission?

  • A. Review the notes in the column headings as well as the comments related to specific records. If you have questions, ask.

Responsibilities and Roles: Who will have what completed when?

  • A. @mannyrules https://github.com/mannyrules / Verriclear will provide the RAW expanded list of activities to be cross-referenced and normalized
  • B. @petermr https://github.com/petermr will analyze the RAW data and deliberate with Emanuel and other experts on how best to organize the data
  • C. @ambarishK https://github.com/ambarishK will perform the cross-referencing and normalization. (Once final version approved, please check with @gita https://github.com/gita to update EssOilDb entries that had multiple Activities assigned to single entries, as noted in the RAW table).

Tips, Tools, Shortcuts and Resources: Anything done or used to make the desired outcome more likely to occur.

  • A. @ambarishK https://github.com/ambarishK: Please search for synonyms in this column too. Hopefully, you can widen the scope by changing the suffixes of the words (eg. Carcinogen, Carcinogenic, Carcinogenicity). Also, try with and without hyphens (eg. Anti-Viral and antiviral)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/24?email_source=notifications&email_token=AAFTCSZ4YMXIYYHOBEDJELTQMVRLXA5CNFSM4I455GLKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HPJKVEA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS32YQG5JUEUN4PWPYDQMVRLXANCNFSM4I455GLA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar Oct 03 '19 09:10 petermr

Hello!! Now going to process ( normalize ) activity sheet - https://github.com/petermr/CEVOpen/blob/master/dictionary/activity/raw/Manny's%20Activity%20Table%20RAW%20for%20Ambarish%202019-10-02.tsv

ambarishK avatar Oct 03 '19 11:10 ambarishK

Please dont create long filenames with spaces or punctuation in. This file could be named:

.../raw/activityClassifcation20191001.tsv

variants should simply have a date:

/raw/activityClassifcation20191003.tsv

so we can tell it's a derivative of previous ones.

Please do NOT include "Manny", "Ambarish" "New", etc.

Also document these in README.md files.

  • where did they come from?
  • what changed in this version?

petermr avatar Oct 03 '19 14:10 petermr

I've uploaded the correctly-named tsv (without the notes in the column header) and added a README.md file too. But the readme won't format properly, so I must have done something wrong.

EmanuelFaria avatar Oct 03 '19 18:10 EmanuelFaria

Let me go through notes in column header of the file and correlate it with each description of README.md

ambarishK avatar Oct 03 '19 18:10 ambarishK

Thanks Ambarish

EmanuelFaria avatar Oct 08 '19 15:10 EmanuelFaria

Normalizing the activities right now. I will post it soon.

ambarishK avatar Oct 08 '19 16:10 ambarishK

@ambarishK Please let me know if you have ANY questions at all about what I provided. Happy to help make your work less challenging or time-consuming

EmanuelFaria avatar Oct 08 '19 16:10 EmanuelFaria

I am finding synonyms. I will tell you as it get done.

Have we to keep these as activities?

Calcium
Chloride
Chromium
Copper
Fluoride
Iodine
Iron
Magnesium
Manganese
Molybdenum
Phosphorus
Potassium
Selenium
Sodium
Sulphur
Zinc
Omega-3: Alpha-linolenic Acid (18:3)
Omega-3: Docosaehexaenoic Acid (Dha, 22:6)
Omega-3: Docosapentaenoic Acid (22:5)
Omega-3: Eicosapentaenoic Acid (Epa, 20:5)
Omega-3: Eicosatetraenoic Acid (20:4)
Omega-3: Stearidonic Acid (18:4)
Omega-6: Adrenic Acid (22:4)
Omega-6: Arachidonic Acid (Aa, Ara) (20:4)
Omega-6: Calendic Acid (18:3)
Omega-6: Dihomo-vamma-linolenic Acid (Dgla) (20:3)
Omega-6: Docosadienoic Acid (22:2)
Omega-6: Eicosadienoic Acid (20:2)
Omega-6: Gamma-linolenic Acid (Gla) (18:3)
Omega-6: Linoleic Acid (La) (18:2)
Omega-6: Osbond Acid (22:50
Omega-6: Tetracosapentaenoic Acid (24:5)
Omega-6: Tetracosatetraenoic Acid (24:4)
Omega-9: Elaidic Acid (18:1)
Omega-9: Erucic Acid (22:1)
Omega-9: Gondoic Acid (20:1)
Omega-9: Mead Acid (20:3)
Omega-9: Nervonic Acid (24:1)
Omega-9: Oleic Acid (18:1)
Omega-9: Palmitic Acid C16:0 14%
Omega-9: Stearic Acid C18:1 36%
Omega-9: Ximenic Acid (26:1)
Vitamin A: Retinol
Vitamin B1: Thiamine
Vitamin B12: Cobalamin
Vitamin B2: Riboflavin
Vitamin B3: Niacin
Vitamin B5: Pantothenic Acid
Vitamin B6: Pyroxidine
Vitamin B7: Biotin
Vitamin B9: Folate
Vitamin C: Ascorbic acid (C6H8O6)
Vitamin E: Tocopherols
Vitamin F:
Vitamin K1: Phylloquinone
Vitamin K2: (Menaquinone

  • I think we will have to drop old caids and assign new activity id to unique activity record !!

ambarishK avatar Oct 08 '19 17:10 ambarishK

No. I called those "Nutritive" for my own personal use, but I don't think "Nutritive" is officially an activity. Happy to find out I'm wrong however.

Edit: Spelling: "Nutritive": "relating to nutrition. providing nourishment; nutritious"

EmanuelFaria avatar Oct 08 '19 17:10 EmanuelFaria

OK. I am keeping them into the Original EssoilDb data column.

ambarishK avatar Oct 08 '19 17:10 ambarishK

I would omit these. An activity is something that changes the state of an organism.

On Tue, Oct 8, 2019 at 6:04 PM Emanuel Faria [email protected] wrote:

No. I called those "Nutrative" for my own personal use, but I don't think "Nutrative" is officially an activity. Happy to find out I'm wrong however.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/24?email_source=notifications&email_token=AAFTCS2UOBP7UWFAXFY4V43QNS4SRA5CNFSM4I455GLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAU4U6A#issuecomment-539609720, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS54LZ3QBK3CWJPCJADQNS4SRANCNFSM4I455GLA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar Oct 08 '19 17:10 petermr

Agreed. If anything, they might be considered "synonyms" or lay terms for the chemical constituents of some oils. I don't know if we're doing that though. Are we @petermr ?

EmanuelFaria avatar Oct 08 '19 17:10 EmanuelFaria

We will record anything that is in the oil. There are few examples where non-terpenes has been reported, especially lipids. This will probably depend on solvent extraction rather than distillation (the lipids are not volatile).

On Tue, Oct 8, 2019 at 6:43 PM Emanuel Faria [email protected] wrote:

Agreed. If anything, they might be considered "synonyms" or lay terms for the chemical constituents of some oils. I don't know if we're doing that though. Are we @petermr https://github.com/petermr ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/24?email_source=notifications&email_token=AAFTCS55HM2PFZ4FL6EYABTQNTBENA5CNFSM4I455GLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAVAL4Q#issuecomment-539624946, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6OMRPVLTKGNPW5YXTQNTBENANCNFSM4I455GLA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar Oct 08 '19 18:10 petermr

Hello Sir.

I have found all synonyms for activities after normalizing them. I have added a column for normalized and unique activities - uactivity.

  • column for normalized and unique activities - uactivity.

There are columns for match status as uniquely found or synonym - unique/synonym. First uniquely found activity is marked as F and S stands for synonym.

  • columns for match status as uniquely found or synonym - unique/synonym.

An additional column adjacent to unique/synonym is uactivityMatch. This one is for found status of unique activities. F stands for uniquely found into Original EssoilDB data.

  • found status of unique activities. F stands for uniquely found into Original EssoilDB data - uactivityMatch.

  • cell value with multiple entries for activities are marked as ME into the unique/synonym column.

All columns and data of the Manny's Activity Table RAW for Ambarish 2019-10-02.tsv are as it is.

File containing additional columns for finding synonyms is ActivityClassificationRAW20191009.tsv

Next is to edit multiple entries for activities into separate rows.

ambarishK avatar Oct 09 '19 09:10 ambarishK

Something has gone wrong here. It's too complicated. All we need is the unique terms (probably in uactivity) Then we need to look them up in Wikidata. Do NOT include the chemicals (elements, acids, vitamins). These are NOT activities. At a first pass just include uactivity and a unique ID. I will edit out th chemical stuff. The dictionary is simply a iist of terms and links to wikidata. Do NOT attempt to include a classification. Wikidata will provide that.

On Wed, Oct 9, 2019 at 10:45 AM Ambarish Kumar [email protected] wrote:

Hello Sir.

I have found all synonyms for activities after normalizing them. I have added a column for normalized and unique activities - uactivity.

  • column for normalized and unique activities - uactivity.

There are columns for match status as uniquely found or synonym - unique/synonym. First uniquely found activity is marked as F and S stands for synonym.

  • columns for match status as uniquely found or synonym - unique/synonym.

An additional column adjacent to unique/synonym is uactivityMatch. This one is for found status of unique activities. F stands for uniquely found into Original EssoilDB data.

  • found status of unique activities. F stands for uniquely found into Original EssoilDB data - uactivityMatch.

All columns and data of the Manny's Activity Table RAW for Ambarish 2019-10-02.tsv https://github.com/petermr/CEVOpen/blob/master/dictionary/activity/raw/Manny's%20Activity%20Table%20RAW%20for%20Ambarish%202019-10-02.tsv are as it is.

File containing additional columns for finding synonyms is ActivityClassificationRAW20191009.tsv https://github.com/petermr/CEVOpen/blob/master/dictionary/activity/raw/ActivityClassificationRAW20191009.tsv

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/24?email_source=notifications&email_token=AAFTCS2VPQEJJ6F5YBKTJG3QNWR3XA5CNFSM4I455GLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAXJ27I#issuecomment-539925885, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6GIVEGUVNJEAWY4XLQNWR3XANCNFSM4I455GLA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar Oct 09 '19 10:10 petermr

OK sir.

ambarishK avatar Oct 09 '19 10:10 ambarishK

Peter, I haven't seen Ambarish' latest update, but I think you may be looking at the extra columns from my original file that I asked him to leave in so I could manually use his updates to revise the tables in my formulation database. I'll take a look when I get home in about an hour from now (that will be approximately 11h15 Brasilia time). Hang tight.

EmanuelFaria avatar Oct 09 '19 13:10 EmanuelFaria

Hello Manny sir!

Please go through the updated file - ActivityClassificationRAW20191009.tsv . Columns and data are same as of your's file.

ambarishK avatar Oct 09 '19 13:10 ambarishK

Please remove rows 156-212 inclusive. These are not activities.

Look at line 17 caname == bacteriostatic uactivity = anti-allergic

This makes no sense. Just create a SINGLE COLUMN of activities. and remove all other columns. I will edit that column.

On Wed, Oct 9, 2019 at 2:33 PM Ambarish Kumar [email protected] wrote:

Hello Manny sir!

Please go through the updated file - ActivityClassificationRAW20191009.tsv https://github.com/petermr/CEVOpen/blob/master/dictionary/activity/raw/ActivityClassificationRAW20191009.tsv . Columns and data are same as of your's file.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/24?email_source=notifications&email_token=AAFTCS447Y2YMAMPTT2DE4LQNXMTRA5CNFSM4I455GLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAX4UGI#issuecomment-540002841, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS7FDSEPOYKFN2ITTCTQNXMTRANCNFSM4I455GLA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar Oct 09 '19 14:10 petermr

Thanks Ambarish. Peter is correct though. It seems that the normalized names are not in line with the originals. It would be best to just keep it simple with the few columns Peter suggests (Column J from my file is the most important column in terms of including activities that were in EssOildb as well as new ones I collected — and need). Sorry I made this more difficult.

EmanuelFaria avatar Oct 09 '19 15:10 EmanuelFaria

Welcome Manny sir ! I make changes as per Peter sir suggestion and will edit my activity file - ActivityClassificationRAW20191009.tsv

ambarishK avatar Oct 09 '19 15:10 ambarishK

Sir, Please check for normalized list of activities. ActivitiesNormalizedE1.020191010.tsv

I have excluded all elements, lipids and vitamins.

Total count is 228.

Few records has activities written as statement or comment. e.g

-  under testing as a skin penetration enhancer for the transdermal delivery of therapeutic drugs 

- used in manufacture of MDMA(ecstasy)

- used in traditional chinese medicine 

Once finalised, will add WIKIDATA ID and run script for dictionary making.

ambarishK avatar Oct 10 '19 09:10 ambarishK

Sir, please check the WIKIDATA ID links to first few activities. Should I proceed same way or change the approach?

I am getting WIKIDATA search for each activity and selecting most appropriate ID for the query.

First 15 records are as follows.


Activities              wikidataid              Description

Abortifacient          Q323047               activity agents: abortifacient Agents (Q323047)
Acaricide               Q416014                 activity agents: abortifacient Agents (Q323047)
ACE-inhibitor         Q288280               activity agents: abortifacient Agents (Q323047)
AChE-Inhibitor       Q63229690           activity agents: abortifacient Agents (Q323047)
Aldose-Reductase Inhibitor  Q4713968   activity: Aldose reductase inhibitor (Q4713968)
Allelochemic           Q39187846         scientific article: Allelochemic function for a primary metabolite: the case of l-tyrosine hyper-production in Inga umbellifera (Fabaceae). (Q39187846)
Allelopathic 
Allergenic               Q58646793          activity agent: Allergenic Pollen (Q58646793)
Analgesic               Q173235               activity: analgesic (Q173235)
Anaphylactic           Q168800              activity: anaphylaxis (Q168800)
Anesthetic              Q4990531           activity: anesthetic (Q4990531)
Anti-acetylcholinesterase   Q52211338   scientific article: Anti-acetylcholinesterase antibodies display cholinesterase-like activity. (Q52211338)
Anti-Acne                Q8106593              Category:Anti-acne preparations (Q8106593)
Anti-allergic             Q50430264            activity agent: Anti-Allergic Agents (Q50430264)
Anti-aggregant        Q67486978             scintific article: Anti-aggregants in clinical practice (Q67486978)
Anti-alzheimer         Q53327702            scintific article: Editorial: Anti Alzheimer agents. (Q53327702)

Sir, how to automate wikidata id extraction using SPARQL query .

ambarishK avatar Oct 10 '19 11:10 ambarishK

Please make sure the description is for the correct activity and the WikidataIds match.

Remove scientific articles

petermr avatar Oct 10 '19 11:10 petermr

On Thu, Oct 10, 2019 at 10:56 AM Ambarish Kumar [email protected] wrote:

Sir, Please check for normalized list of activities.

I have excluded all elements, lipids and vitamins.

Good

Total count is 228.

Few records has activities written as statement or comment. e.g

  • under testing as a skin penetration enhancer for the transdermal delivery of therapeutic drugs

  • used in manufacture of MDMA(ecstasy)

  • used in traditional chinese medicine

these are not actvities. Activities are normally a single word.

Once finalised, will run script for dictionary making.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/24?email_source=notifications&email_token=AAFTCSZWQOOKLL2RUV5HKV3QN335VA5CNFSM4I455GLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA3UWWY#issuecomment-540494683, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYFPDQS6JISD7U6WODQN335VANCNFSM4I455GLA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar Oct 10 '19 12:10 petermr

edited dictionary/activity/raw/ActivitiesNormalizedE1.020191010.tsv to remove non-activity terms.

Now add wikidata IDs and create dictionary using ami-dictionary

petermr avatar Oct 10 '19 17:10 petermr

Sir, please go through the activity dictionary file activity/activities20191011.xml


Normalised activity file - ActivitiesNormalizedE1.020191010.tsv

Column description is as follows.

  • activity_id - unique activity id assigned to activity (E2.0).

  • activities - extracted and normalized activities.

  • wikipedia - wikipedia query string.

  • wikidata - wikidata identifier.

ambarishK avatar Oct 11 '19 10:10 ambarishK

@petermr @ambarishK I have some thoughts to address:

Maybe there’s a different way to go about this that we could consider. First some background on my thinking:

Single-word (or compound word) activities very often are top-level generic terms (ie. Anti-Fungal, Anti-Bacterial, Anti-Microbial, Anti-Viral). These are not very useful when trying to formulate solutions that get the root of a problem health condition. Be it a pimple or an ear infection: just as not every Essential Oil or EO constituent deemed “Anti-Bacterial” will work against acne, not all antibiotics are effective against an ear infection. To be fit for purpose, finer distinctions are required.

I’m concerned that when auto-processing data — be it from wikidata or elsewhere — we need to be careful that neither we, nor that data confuses/conflate "activity agents" (ie. Pollen) with "Activity" (ie. Antiallergenic) when the real target is “Histimine”, which means real/useful activity we’re looking for is Anti-histiminic, which I consider a sub-type of “Anti-Allergenic”.

This level of Activity classification is akin to “Kingdom, Phyllum, Class, Order, Family, Genus, Species. From months of gathering my own data, I’m convinced that no-one has created such a classification system for activity/target/pathway, but this is exactly what is needed. And I think that WE’RE the ones to do this with D.A.V.E..

To make D.A.V.E. truly fit for purpose, we need to distinguish activities in finer detail — which already exists in Wikidata to some extent* — but we may discover the need/opportunity to submit existing definitions that are not yet included in Wikidata, or coin new terms that need to be defined.

*I typed “Inhibitor” into Wikidata and got a pretty big list, but I’m hoping there is a better way to extract the entire Activity list from wikidata than guessing every type of prefix (Pro-, Anti-), suffix (-stat, -icide), or adjective (promotor, inhibitor) and copying and pasting the results by hand. https://www.wikidata.org/w/index.php?sort=relevance&search=inhibitor+inhibitor&title=Special%3ASearch&profile=advanced&fulltext=1&advancedSearch-current=%7B%22fields%22%3A%7B%22plain%22%3A%5B%22inhibitor%22%5D%7D%7D&ns0=1&ns120=1

For reference, take a look at this PDF with Anti-Allergenic data from the spreadsheet I provided. In it, Anti-Allergenic not only speaks to the Allergy trigger (ie. pollen), but also defines the activity by PATHWAY or PATHWAY COMPONENT upon which the action performs action.

To make D.A.V.E. a viable/reliable tool — one that the scientific communities and formulators alike can rely on to separate the wheat from the chaff — this level of activity distinction will absolutely be required. … and from what I understand so far about semantic search engines and NLP, I think it’s doable.

While not yet certain I have a firm handle on the operation of the Semantic search engine, what I imagine is that ultimately it will entail searching the literature for Natural Language variations like:

“inhibits Bradykinin action” “inhibition of Bradykinin(s)” “shows anti-Bradykinin activity” “counters Bradykinin activity” “counters Bradykinin activity”

...to relate it to either an existing activity or one newly defined by us — such as “Bradykinin inhibitor” — and the rest, I suppose, being treated as Synonyms in a lookup table that will be available when DAVE is online and users can input search requests that DAVE will relate to our top-level term. (Please correct me where I’m wrong).

So here’s my suggested approach with regard to ACTIVITIES:

I chatted with @ambarishK last night (2019-10-10), to find out more about how he accesses Wikidata to gather “categories” of information as well as to understand the steps he takes to manually normalize our activity data with what exists on Wikidata.

What I propose is: rather than starting with a LIMITED list of activities from EssOildb and Verriclear’s databases and then cross-referencing that to wikidata, that we instead:

  1. Determine which of the categories of interest listed below exist in Wikidata
  2. Download them all as a table with each category being the column header
  3. I will do the normalization using his steps (Since I’ve been working on collecting and defining activities for some months now, I’m more familiar with them than he is.)

I’ll update as soon as I see what @ambarishK can get.

This is the list of categories I HOPE exists in wikidata:

  • Activity Class [wikidata: subclass of] (even if just as a drop-down list that helps users fine-tune their searches)
  • Wikidata: Commons category (aka commonscat, category Commons)
  • Wikidata: topic's main category (aka. main category, category for this topic, subject category, has category)
  • Activity [wikidata: Label] (activity (Q1914636): event; actions that result in changes of state
  • Activity Definition:
  • Activity Description:
  • General Information:
  • Reported Uses (Diseases, Symptoms, etc.)
  • Site of Action in human body: (eg: skin, liver, muscles)
  • Mechanism of Action
  • Activity Target (Organism: Specific Bacteria, fungi, virii, etc)
  • Activity Target (Related Pathways: COX1,2, Inflammation, etc.)
  • Conditions associated with increased/decreased [Compound or Activity]:
  • Substances that increase [Compound or Activity]:
  • Substances that decrease [Compound or Activity]:
  • Reported Health Benefits:
  • Found in: [Plants, Essential Oils, Foods, Drugs]
  • Interacts with: [Genes, Diseases, Drugs, Compounds]

Besides being useful (I think) for creating more semantic phrases (?) that help us search articles, having the extra data in such table would help me make sure

BOTTOM LINE: I’m hoping we don’t have to… but if WE must be the ones to set the standards for new definitions that don't exist in Wikidata or even the general literature, then I’m definitely down for that. :)

P.S. Since I began searching for and distinguishing activities for my own purposes, I’ve been compiling a list of prefixes, suffixes and other words that describe “states or state changes” that may be useful in searching for and coining new activities. (This includes “modifiers” such as pro- vs anti-, inhibitor vs promotor, -static vs -icide, etc.,). Peter likely has lists like this too, but never hurts to double-check.

EmanuelFaria avatar Oct 11 '19 13:10 EmanuelFaria