CEVOpen
CEVOpen copied to clipboard
compound synonyms and stereochemistry
The compound names in table columns are frequently ambiguous. The first table is https://github.com/petermr/CEVOpen/blob/master/articleAnalysis/oil186/raw/thyme.tsv
Compound Compound_dictionary_lookpup E2.0_compound_identifiers notes wikidata_identifier
alpha-Thujene (-)-alpha-thujene ; (+)-alpha-thujene C764 ; C786 stereo-isomers of the compounds are there. Q27121815 ; Q27121804
alpha-Pinene alpha-Pinene C2849 Also, stereo-isomers of the compounds are there. Q27104380
beta-Pinene beta-Pinene C349 Also, stereo-isomers of the compounds are there.
beta-Myrcene beta-Myrcene C345 Q424577
alpha-Phellandrene alpha-Phellandrene C2848 Q19606345
Carene<δ-2-> 2-carene C1720 Lookup is of '2-carene'
D-Limonene (+)-limonene C792 Q27888324
beta-Phellandrene beta-Phellandrene C3426 Q19606727
para-Cymene cymene C4118 Other cymene are present as 'm-cymenene', 'dehydro-p-cymene', 'o-cymene', Q284072
gamma-Terpinene beta-terpinene C355 Present as beta-terpinene Q23057921
Terpineol 1-terpineol C1482 Q27276701
Terpinen-4-ol (+)-terpinen-4-ol C795 Q27280168
Thymol not present.
Caryophyllene (z)-caryophyllene ; 9-epi-(E)-caryophyllene ; alpha-caryophyllene C1255 ; C2705 ; C2915 Stereo-isomers are present NA ; Q27137093 ; Q1995108
implementing
- add
<synonym>
child elements to dictionary<entry>
elements - lookup unknowns in wikidata and identify synonyms of existing entries
Will start by creating a bag of unknown terms.
analysing isomerism and synonyms
We need to sort compounds by WikidataID and PubchemCID to determine synonyms. Example:
para-cymen-7-ol 325 4-Isopropylbenzyl alcohol
p-cymen-7-ol p-cymen-7-ol 325 4-Isopropylbenzyl alcohol
These two entries relate to the same CID so should be grouped together. PMR will then decide which is the best to keep
cuminaldehyde cuminaldehyde cuminaldehyde Q419952 326 4-Isopropylbenzaldehyde
cuminal cuminal cuminaldehyde Q419952 326 4-Isopropylbenzaldehyde
octanal
has both Wikidata and Pubchem
sort TSV file by WikidataID and remove synonyms
@ambarishK will sort table in a spreadsheet on WikidataID column. notFoundWIKIDATASortedPubChem.tsv PMR will then edit this manually
sort TSV file by PubchemCID and remove synonyms
@ambarishK will sort table in a spreadsheet on PubChemID column. notFoundWIKIDATAPubChemSorted.tsv PMR will then edit this manually
The recommitted files will normalize to a single reference for Wikidata and for Pubchem. PMR will then merge possible conflicts and fuzziness.