openVirus icon indicating copy to clipboard operation
openVirus copied to clipboard

ami search: some files empty for customized dictionary

Open AmbrineH opened this issue 5 years ago • 14 comments
trafficstars

ami search is giving empty files for histogram.csv and some xml files but I am getting other html files like full.dataTables.html, etc just fine for my latest dictionary and the error I am getting is:

Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54-NEXT-SNAPSHOT/pmcstop.txt Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54-NEXT-SNAPSHOT/stopwords.txt PMC3561042 .PMC6517453 !wPMC6695746 PMC7102705 PMC7119083 PMC7120695 PMC7197577 PMC7241517 PMC7341712 !wPMC7395586 ..... create data tables Null pluginOption'

Dictionary Used: DICTIONARY_ COUNTRY

Command: ami -p ami_12_08_2020/try_for_ami_search_1 search --dictionary ami_12_08_2020/country_final.xml

Output: ALL OUTPUT FILES

AmbrineH avatar Aug 12 '20 14:08 AmbrineH

The first paragraph refers to ami search but the second refers to amidict . Can you please clarify which? and if there are two issues, pleaes separate them.

petermr avatar Aug 13 '20 15:08 petermr

Did you create a dictionary successfully with amidict? If so this should be independent of using ami search. If the search fails with every dictionary, then the problem is with ami search - if it fails only with the one created by amidict please make sure that is uploaded. there seems to be a problem with the cooccurrence for some people and not others. This may be because the dictionary is not correct. Ideally I need:

  • ami -p P search --dictionary A // works
  • ami -p P search --dictionary B // does not work Then it's probably the dictionary.

P.

petermr avatar Aug 13 '20 15:08 petermr

I am sorry, I copied the wrong command. It is supposed to be about ami search only. I have updated the issue. My apologies for the inconvenience.

The query works fine for the inbuilt dictionary and not on my own (which I created using SPARQL query and then converted using amidict). But I shall certainly try other variants as well but I believe @Priya-Jk-15 has the same issue as well.

AmbrineH avatar Aug 13 '20 15:08 AmbrineH

If you can post a list of the files you used (please make them available on github), the commands, and the problem I will try to solve it.

P.

On Thu, Aug 13, 2020 at 4:58 PM Ambreen H [email protected] wrote:

I am sorry, I copied the wrong command. It is supposed to be about ami search only. I have updated the issue. My apologies for the inconvenience.

The query works fine for the inbuilt dictionary and not on my own (which I created using SPARQL query and then converted using amidict). But I shall certainly try other variants as well but I believe @Priya-Jk-15 https://github.com/Priya-Jk-15 has the same issue as well.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/80#issuecomment-673561798, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5TNRAHPDWXZGHCZADSAQERDANCNFSM4P4Y5Z3A .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar Aug 13 '20 17:08 petermr

I tried the following query to create a new corpus and tried it multiple times changing the number of downloaded articles from 10 - 950.

GET PAPERS QUERY: getpapers -q "viral epidemics" -o ami_12_08_2020/try_for_ami_search_1 -f v_epid/log.txt -x -p -k 10 AMI SEARCH: ami -p ami_12_08_2020/try_for_ami_search_1 search --dictionary ami_12_08_2020/country_final.xml

I have committed the smaller folder here: TEST_ FOLDER_WITH_RESULT Dictionary Used: DICTIONARY_COUNTRY

AmbrineH avatar Aug 14 '20 05:08 AmbrineH

@Prasinus818 When I gave ami search, I mentioned neither the path nor the folder name of the dictionary disease. I only gave --dictionary disease and I think it only used inbuilt dictionary. Since when I replaced disease with drug and virus, ami search didn't create DataTables. Will ami search be able to get the dictionary from its folder name because I think you have used the country dictionary's folder name country_final.xml? Kindly please clarify.

Priya-Jk-15 avatar Aug 14 '20 07:08 Priya-Jk-15

PLEASE use one issue per topic. This Issue contains material on getpapers, ami search, amidict . An issue reporting a bug should contain the minimal information to reproduce the bug.

I suggest opening new issue(s) which contain a precise statement of the problem. It helps if all the files are small, e.g. 10 CTrees for a CProject.

petermr avatar Aug 14 '20 08:08 petermr

I created a small disease dictionary with 10 entries which is at https://github.com/petermr/openVirus/blob/master/dictionaries/diseases/amisearch%20issue/disease.xml . I validated the dictionary using the syntax amidict -v --dictionary dic display --fields --validate.

disease validate (10)

Then, I used the dictionary for ami search on a corpus of 5 Ctrees. It created only empty DataTables which is at https://github.com/petermr/openVirus/tree/master/dictionaries/diseases/amisearch%20issue/virepi

@petermr please check it.

Priya-Jk-15 avatar Aug 14 '20 12:08 Priya-Jk-15

@petermr ,

  • I am trying to use ami search for the customised dictionary funder which is committed at : https://github.com/petermr/openVirus/blob/master/dictionaries/funders/funder.xml

  • I tested it on a corpus of 10 articles which were downloaded using the getpapers query : getpapers -q "viral epidemic" -o minicorpus10 -x -k 10

  • The ami search command I used was : ami -p minicorpus10 search --dictionary C:\Users\me\funder.xml

  • It did not create the full.datatables.html and empty _cooccurrence. When I used this corpus to search with the built-in dictionary funders, it worked out really well. This suggests that the corpus is fine, but sadly my dictionary isn't.

vaishaliarora277 avatar Aug 20 '20 15:08 vaishaliarora277

Thanks, That's a very clear summary. I will try to add summarization the the results of search. Will take 2-3 hours...

On Thu, Aug 20, 2020 at 4:56 PM VAISHALI ARORA [email protected] wrote:

@petermr https://github.com/petermr ,

I am trying to use ami search for the customised dictionary funder which is committed at : https://github.com/petermr/openVirus/blob/master/dictionaries/funders/funder.xml

I tested it on a corpus of 10 articles which were downloaded using the getpapers query : getpapers -q "viral epidemic" -o minicorpus10 -x -k 10

The ami search command I used was : ami -p minicorpus10 search --dictionary C:\Users\me\funder.xml

It did not create the full.datatables.html and empty _cooccurrence. When I used this corpus to search with the built-in dictionary funders, it worked out really well. This suggests that the corpus is fine, but sadly my dictionary isn't.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/80#issuecomment-677751063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5RIIC3FK5JDEPBVKDSBVBSPANCNFSM4P4Y5Z3A .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar Aug 21 '20 11:08 petermr

@petermr I validated my created dictionary and got the following output:

Generic values (DictionaryDisplayTool)
================================
-v to see generic values
Specific values (DictionaryDisplayTool)
================================
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@14751b3
--fields            : d        []
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--suffix            : d       xml
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d [disease]
--directory         : d       dic
Dictionary: disease
entries: 13814
    myopia
    psychosis
    psychosis
    ....

Then, I used the dictionary to search a corpus of 10 Ctrees. I got _cooccurence and the results folder has results.xml but the full.datatables.html has only frequecies. The results are at https://github.com/petermr/openVirus/tree/master/examples/Priya/amisearch_issue/virepi .

The tree of the corpus:

Folder PATH listing for volume OS
Volume serial number is 845F-351F
C:.
├───10.1101
│   ├───2020.06.10.20127597
│   ├───results
│   │   └───search
│   │       └───disease
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───10.11012020.06.10.20127597
│   ├───results
│   │   └───search
│   │       └───disease
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───PMC6517453
│   ├───results
│   │   └───search
│   │       └───disease
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───PMC6695746
│   ├───results
│   │   ├───search
│   │   │   └───disease
│   │   └───word
│   │       └───frequencies
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───PMC7119083
│   ├───results
│   │   ├───search
│   │   │   └───disease
│   │   └───word
│   │       └───frequencies
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───PMC7409732
│   ├───results
│   │   ├───search
│   │   │   └───disease
│   │   └───word
│   │       └───frequencies
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───search.openVirus
│   ├───dictionaries
│   │   └───diseases
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
└───__cooccurrence
    ├───disease
    └───disease-disease

Priya-Jk-15 avatar Aug 22 '20 12:08 Priya-Jk-15

I could create the dictionary and validated it. Please let me know if the validation results are correct @petermr:

Generic values (DictionaryDisplayTool)
================================
-v to see generic values

Specific values (DictionaryDisplayTool)
================================
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@2b4c1d96
--fields            : d        []
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--suffix            : d       xml
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d [country]
--directory         : d ami_12_08_2020\amidict10

Dictionary: country

entries: 263
    Afghanistan
    Albania
    Algeria
    ....

I will update the results in the wiki if they are fine

AmbrineH avatar Aug 22 '20 13:08 AmbrineH

At the moment if there are no error messages the dictionary is probably fine.

On Sat, Aug 22, 2020 at 2:03 PM Ambreen H [email protected] wrote:

I could create the dictionary and validated it. Please let me know if the validation results are correct @petermr https://github.com/petermr:

Generic values (DictionaryDisplayTool)

-v to see generic values

Specific values (DictionaryDisplayTool)

--testString : d null --wikilinks : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@2b4c1d96 --fields : d [] --files : d [] --maxEntries : d 3 --remote : d [https://github.com/petermr/dictionary] --suffix : d xml --validate : m true --help : d false --version : d false --dictionary : d [country] --directory : d ami_12_08_2020\amidict10

Dictionary: country

entries: 263 Afghanistan Albania Algeria ....

I will update the results in the wiki if they are fine

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/80#issuecomment-678638616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS64KTDKMKBF5TGBVMDSB6635ANCNFSM4P4Y5Z3A .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar Aug 22 '20 13:08 petermr

@petermr Please check my comment above regarding this issue. I still think full.datatables.html needs some changes.

Priya-Jk-15 avatar Aug 27 '20 02:08 Priya-Jk-15