openVirus
openVirus copied to clipboard
ami search: some files empty for customized dictionary
ami search is giving empty files for histogram.csv and some xml files but I am getting other html files like full.dataTables.html, etc just fine for my latest dictionary and the error I am getting is:
Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54-NEXT-SNAPSHOT/pmcstop.txt Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54-NEXT-SNAPSHOT/stopwords.txt PMC3561042 .PMC6517453 !wPMC6695746 PMC7102705 PMC7119083 PMC7120695 PMC7197577 PMC7241517 PMC7341712 !wPMC7395586 ..... create data tables Null pluginOption'
Dictionary Used: DICTIONARY_ COUNTRY
Command:
ami -p ami_12_08_2020/try_for_ami_search_1 search --dictionary ami_12_08_2020/country_final.xml
Output: ALL OUTPUT FILES
The first paragraph refers to ami search but the second refers to amidict .
Can you please clarify which? and if there are two issues, pleaes separate them.
Did you create a dictionary successfully with amidict? If so this should be independent of using ami search. If the search fails with every dictionary, then the problem is with ami search - if it fails only with the one created by amidict please make sure that is uploaded.
there seems to be a problem with the cooccurrence for some people and not others. This may be because the dictionary is not correct. Ideally I need:
ami -p P search --dictionary A// worksami -p P search --dictionary B// does not work Then it's probably the dictionary.
P.
I am sorry, I copied the wrong command. It is supposed to be about ami search only. I have updated the issue. My apologies for the inconvenience.
The query works fine for the inbuilt dictionary and not on my own (which I created using SPARQL query and then converted using amidict). But I shall certainly try other variants as well but I believe @Priya-Jk-15 has the same issue as well.
If you can post a list of the files you used (please make them available on github), the commands, and the problem I will try to solve it.
P.
On Thu, Aug 13, 2020 at 4:58 PM Ambreen H [email protected] wrote:
I am sorry, I copied the wrong command. It is supposed to be about ami search only. I have updated the issue. My apologies for the inconvenience.
The query works fine for the inbuilt dictionary and not on my own (which I created using SPARQL query and then converted using amidict). But I shall certainly try other variants as well but I believe @Priya-Jk-15 https://github.com/Priya-Jk-15 has the same issue as well.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/80#issuecomment-673561798, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5TNRAHPDWXZGHCZADSAQERDANCNFSM4P4Y5Z3A .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I tried the following query to create a new corpus and tried it multiple times changing the number of downloaded articles from 10 - 950.
GET PAPERS QUERY: getpapers -q "viral epidemics" -o ami_12_08_2020/try_for_ami_search_1 -f v_epid/log.txt -x -p -k 10
AMI SEARCH: ami -p ami_12_08_2020/try_for_ami_search_1 search --dictionary ami_12_08_2020/country_final.xml
I have committed the smaller folder here: TEST_ FOLDER_WITH_RESULT Dictionary Used: DICTIONARY_COUNTRY
@Prasinus818 When I gave ami search, I mentioned neither the path nor the folder name of the dictionary disease. I only gave --dictionary disease and I think it only used inbuilt dictionary. Since when I replaced disease with drug and virus, ami search didn't create DataTables. Will ami search be able to get the dictionary from its folder name because I think you have used the country dictionary's folder name country_final.xml? Kindly please clarify.
PLEASE use one issue per topic.
This Issue contains material on getpapers, ami search, amidict .
An issue reporting a bug should contain the minimal information to reproduce the bug.
I suggest opening new issue(s) which contain a precise statement of the problem. It helps if all the files are small, e.g. 10 CTrees for a CProject.
I created a small disease dictionary with 10 entries which is at https://github.com/petermr/openVirus/blob/master/dictionaries/diseases/amisearch%20issue/disease.xml . I validated the dictionary using the syntax amidict -v --dictionary dic display --fields --validate.
Then, I used the dictionary for ami search on a corpus of 5 Ctrees. It created only empty DataTables which is at https://github.com/petermr/openVirus/tree/master/dictionaries/diseases/amisearch%20issue/virepi
@petermr please check it.
@petermr ,
-
I am trying to use ami search for the customised dictionary funder which is committed at : https://github.com/petermr/openVirus/blob/master/dictionaries/funders/funder.xml
-
I tested it on a corpus of 10 articles which were downloaded using the getpapers query :
getpapers -q "viral epidemic" -o minicorpus10 -x -k 10 -
The ami search command I used was :
ami -p minicorpus10 search --dictionary C:\Users\me\funder.xml -
It did not create the full.datatables.html and empty _cooccurrence. When I used this corpus to search with the built-in dictionary funders, it worked out really well. This suggests that the corpus is fine, but sadly my dictionary isn't.
Thanks, That's a very clear summary. I will try to add summarization the the results of search. Will take 2-3 hours...
On Thu, Aug 20, 2020 at 4:56 PM VAISHALI ARORA [email protected] wrote:
@petermr https://github.com/petermr ,
I am trying to use ami search for the customised dictionary funder which is committed at : https://github.com/petermr/openVirus/blob/master/dictionaries/funders/funder.xml
I tested it on a corpus of 10 articles which were downloaded using the getpapers query : getpapers -q "viral epidemic" -o minicorpus10 -x -k 10
The ami search command I used was : ami -p minicorpus10 search --dictionary C:\Users\me\funder.xml
It did not create the full.datatables.html and empty _cooccurrence. When I used this corpus to search with the built-in dictionary funders, it worked out really well. This suggests that the corpus is fine, but sadly my dictionary isn't.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/80#issuecomment-677751063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5RIIC3FK5JDEPBVKDSBVBSPANCNFSM4P4Y5Z3A .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
@petermr I validated my created dictionary and got the following output:
Generic values (DictionaryDisplayTool)
================================
-v to see generic values
Specific values (DictionaryDisplayTool)
================================
--testString : d null
--wikilinks : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@14751b3
--fields : d []
--files : d []
--maxEntries : d 3
--remote : d [https://github.com/petermr/dictionary]
--suffix : d xml
--validate : m true
--help : d false
--version : d false
--dictionary : d [disease]
--directory : d dic
Dictionary: disease
entries: 13814
myopia
psychosis
psychosis
....
Then, I used the dictionary to search a corpus of 10 Ctrees. I got _cooccurence and the results folder has results.xml but the full.datatables.html has only frequecies. The results are at https://github.com/petermr/openVirus/tree/master/examples/Priya/amisearch_issue/virepi .
The tree of the corpus:
Folder PATH listing for volume OS
Volume serial number is 845F-351F
C:.
├───10.1101
│ ├───2020.06.10.20127597
│ ├───results
│ │ └───search
│ │ └───disease
│ └───search.openVirus
│ └───dictionaries
│ └───diseases
├───10.11012020.06.10.20127597
│ ├───results
│ │ └───search
│ │ └───disease
│ └───search.openVirus
│ └───dictionaries
│ └───diseases
├───PMC6517453
│ ├───results
│ │ └───search
│ │ └───disease
│ └───search.openVirus
│ └───dictionaries
│ └───diseases
├───PMC6695746
│ ├───results
│ │ ├───search
│ │ │ └───disease
│ │ └───word
│ │ └───frequencies
│ └───search.openVirus
│ └───dictionaries
│ └───diseases
├───PMC7119083
│ ├───results
│ │ ├───search
│ │ │ └───disease
│ │ └───word
│ │ └───frequencies
│ └───search.openVirus
│ └───dictionaries
│ └───diseases
├───PMC7409732
│ ├───results
│ │ ├───search
│ │ │ └───disease
│ │ └───word
│ │ └───frequencies
│ └───search.openVirus
│ └───dictionaries
│ └───diseases
├───search.openVirus
│ ├───dictionaries
│ │ └───diseases
│ └───search.openVirus
│ └───dictionaries
│ └───diseases
└───__cooccurrence
├───disease
└───disease-disease
I could create the dictionary and validated it. Please let me know if the validation results are correct @petermr:
Generic values (DictionaryDisplayTool)
================================
-v to see generic values
Specific values (DictionaryDisplayTool)
================================
--testString : d null
--wikilinks : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@2b4c1d96
--fields : d []
--files : d []
--maxEntries : d 3
--remote : d [https://github.com/petermr/dictionary]
--suffix : d xml
--validate : m true
--help : d false
--version : d false
--dictionary : d [country]
--directory : d ami_12_08_2020\amidict10
Dictionary: country
entries: 263
Afghanistan
Albania
Algeria
....
I will update the results in the wiki if they are fine
At the moment if there are no error messages the dictionary is probably fine.
On Sat, Aug 22, 2020 at 2:03 PM Ambreen H [email protected] wrote:
I could create the dictionary and validated it. Please let me know if the validation results are correct @petermr https://github.com/petermr:
Generic values (DictionaryDisplayTool)
-v to see generic values
Specific values (DictionaryDisplayTool)
--testString : d null --wikilinks : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@2b4c1d96 --fields : d [] --files : d [] --maxEntries : d 3 --remote : d [https://github.com/petermr/dictionary] --suffix : d xml --validate : m true --help : d false --version : d false --dictionary : d [country] --directory : d ami_12_08_2020\amidict10
Dictionary: country
entries: 263 Afghanistan Albania Algeria ....
I will update the results in the wiki if they are fine
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/80#issuecomment-678638616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS64KTDKMKBF5TGBVMDSB6635ANCNFSM4P4Y5Z3A .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
@petermr Please check my comment above regarding this issue. I still think full.datatables.html needs some changes.