CEVOpen
CEVOpen copied to clipboard
📕 Documentation: Dictionary.xml and DictionaryDescription.md of: eoAnalysisInstrument (inactive)
Created simple dictionary by hand (can be incremented later)
<dictionary title="instrument">
<desc>Hacked from a few papers PMR 20190904</desc>
<entry term="HP6890" name="HP6890"/>
<entry term="QP-5000" name="QP-5000"/>
<entry term="QP" name="QP"/>
<entry term="QP2010" name="QP2010"/>
<entry term="QP2010S" name="QP2010S"/>
<entry term="Shimadzu" name="Shimadzu"/>
<entry term="Clevenger" name="Clevenger"/>
</dictionary>
NOTE: term
is used for searching (maybe with stemming).
NOTE: these are probably not in Wikidata. Also Clevenger
is not an instrument and should be removed.
name
is descriptive.
title
attribute on dictionary
must match filename
searching with dictionaries
cd CEVOpen
verify that oil186
is a subdirectory
ls oil186
then search:
ami-search -p oil186 --dictionary species country mydictionaries/instrument.xml
species
is a builtin search, country
is a builtin dictionary, mydictionaries/instrument.xml
is relative to current directory.
Results are in PMC*/results/search/instrument/results.xml
etc.
and aggregated in
/some/where/.../CEVOpen/oil186/search.instrument.snippets.xml
as
<projectSnippetsTree>
<snippetsTree>
<snippets file="oil186/PMC4391421/results/search/instrument/results.xml">
<result pre="Ph. Eur. 5.0 [ 3 ], by using a modified" exact="Clevenger" post="apparatus (with the EO collection area cooled to prevent"/>
<result pre="chromatography-mass spectrometry. Samples were analyzed by gas chromatography using a" exact="HP6890" post="instrument coupled with a HP 5973 mass spectrometer. The"/>
</snippets>
</snippetsTree>
<snippetsTree>
<snippets file="oil186/PMC5080681/results/search/instrument/results.xml">
<result pre="500 ml deionized water. Then, the flask was connected with" exact="Clevenger" post="apparatus, which was placed in the same apparatus. While"/>
<result pre="the fresh weight. GC-MS analysis GC-MS chromatograms were recorded using" exact="Shimadzu" post="QP-5000 GC-MS. The GC was equipped with Rtx-5 ms"/>
<result pre="fresh weight. GC-MS analysis GC-MS chromatograms were recorded using Shimadzu" exact="QP-5000" post="GC-MS. The GC was equipped with Rtx-5 ms column"/>
</snippets>
</snippetsTree>
Each CTree (PMC document) is searched into snippetsTree
and the result
XML element is
in W3C Annotation format (pre
, exact
, post
)
Simple grep
that finds mass spec:
grep -r -E -o ".{0,50}mass spectromet{0,50}" PMC*/scholarly.html
will search all the HTML for "mass spectrom" and gives 50 characters either side
Hello,
I am working on how to migrate the article/instrument matches to Wikidata.
The xml with the excerpts is fantastic, but my xml processing skills are still incipient. I remember having seen in the sprint a summary table with the PMC IDs in one column and counts for each term in another column.
Would you know how I can obtain this summary file?
EDIT: Even though I'm still not able to generate the full html table, I could draft some code to migrate to wikidata from the full table. The code is at https://github.com/caffiendFrog/elife2019/tree/master/wikidatamigration
One of the pages edited: https://www.wikidata.org/wiki/Q44476657
This is wonderful Tiago If you checkout oil186/ You will find fulldatatables.html which I think is what you want
On Thu, 12 Sep 2019, 19:12 Tiago Lubiana, [email protected] wrote:
Hello,
I am working on how to migrate the article/instrument matches to Wikidata.
The xml with the excerpts is fantastic, but my xml processing skills are still incipient. I remember having seen in the sprint a summary table with the PMC IDs in one column and counts for each term in another column.
Would you know how I can obtain this summary file?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/15?email_source=notifications&email_token=AAFTCS2SVNRIX3PULJHWQK3QJKBCJA5CNFSM4ITTX33KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6SYUPQ#issuecomment-530942526, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS42G43ZTTEMIDH5LFDQJKBCJANCNFSM4ITTX33A .
Tiago, can you send me your email (by email to peter.murray.rust AT gmail DOT com) so I can connect you with others.
Manny - meet TIago who is in Sao Paulo. Tiago was part of our eLife sprint and worked on the Instruments and how you put this data into Wikidata! So his knowledge will be really valuable for missing Wikidata items. TIago, Manny is in Brasilia and pulling together the CEVOpen project management of extracting plants and their oils from the literature
On Thu, Sep 12, 2019 at 7:59 PM Peter Murray-Rust < [email protected]> wrote:
This is wonderful Tiago If you checkout oil186/ You will find fulldatatables.html which I think is what you want
On Thu, 12 Sep 2019, 19:12 Tiago Lubiana, [email protected] wrote:
Hello,
I am working on how to migrate the article/instrument matches to Wikidata.
The xml with the excerpts is fantastic, but my xml processing skills are still incipient. I remember having seen in the sprint a summary table with the PMC IDs in one column and counts for each term in another column.
Would you know how I can obtain this summary file?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/15?email_source=notifications&email_token=AAFTCS2SVNRIX3PULJHWQK3QJKBCJA5CNFSM4ITTX33KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6SYUPQ#issuecomment-530942526, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS42G43ZTTEMIDH5LFDQJKBCJANCNFSM4ITTX33A .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK