openVirus icon indicating copy to clipboard operation
openVirus copied to clipboard

Testing medrxiv python/ferret downloader

Open petermr opened this issue 4 years ago • 21 comments

petermr avatar May 03 '20 13:05 petermr

Created a file with search_download_medrxiv.py

Problems running with both python3

pm286macbook:ferret pm286$ python3 search_download_medrxiv.py "n95" n95
Traceback (most recent call last):
  File "search_download_medrxiv.py", line 24, in <module>
    query=urllib.quote(sys.argv[1])
AttributeError: module 'urllib' has no attribute 'quote'

and python2

pm286macbook:ferret pm286$ python search_download_medrxiv.py "n95" n95
Running file medrxiv_search_download.fql downloading files to n95
Traceback (most recent call last):
  File "search_download_medrxiv.py", line 28, in <module>
    cmd = ferret + " --param=url:\\\""+query_url+"\\\"  --param=dir:\\\""+output_folder+"\\\" " + fql_file
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
pm286macbook:ferret pm286$ 


petermr avatar May 03 '20 13:05 petermr

I should have added a note that it is in python2 because I assumed thats the default installation on most machines. I can switch it to python3 if its easier The error occurred because ferret wasn't installed. I've put a more descriptive fail in. You would need to follow these steps to install it on MacOS https://github.com/petermr/openVirus/wiki/Ferret

l-hawizy avatar May 03 '20 14:05 l-hawizy

Thanks, I will be intelligent but casual so I pick up undefined operations. P.

On Sun, May 3, 2020 at 3:19 PM l-hawizy [email protected] wrote:

I should have added a note that it is in python2 because I assumed thats the default installation on most machines. I can switch it to python3 if its easier The error occurred because ferret wasn't installed. I've put a more descriptive fail in. You would need to follow these steps to install it on MacOS https://github.com/petermr/openVirus/wiki/Ferret

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623117148, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSY74D33PTJXWK5GS3LRPV4PTANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar May 03 '20 15:05 petermr

I think I already had ferret installed:

Welcome to Ferret REPL 0.10.2
Please use `exit` or `Ctrl-D` to exit this program.
>  

petermr avatar May 03 '20 16:05 petermr

ah great then all you would need is these two commands:

alias ferret="/your/local/directory/ferret_darwin_x86_64/ferret"
export FERRET=ferret

l-hawizy avatar May 03 '20 17:05 l-hawizy

also pull the latest changes

l-hawizy avatar May 03 '20 17:05 l-hawizy

will try tomorrow when brain is working.

On Sun, May 3, 2020 at 6:28 PM l-hawizy [email protected] wrote:

also pull the latest changes

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623148567, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZNGBWATEZM2CRRX43RPWSSJANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar May 03 '20 22:05 petermr

or the easier way just run the ferret command ferret --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

l-hawizy avatar May 04 '20 07:05 l-hawizy

Thanks, Can you set out all the precise steps needed? So I can restart from scratch. Thanks.

I currently have installed FERRET and get:

pm286macbook:~ pm286$ FERRET
Welcome to Ferret REPL 0.10.2
Please use `exit` or `Ctrl-D` to exit this program.
> ^C

On Mon, May 4, 2020 at 8:00 AM l-hawizy [email protected] wrote:

or the easier way just run the ferret command ferret --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623293301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS3O5LZ4SBW3GE2KIEDRPZRZNANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar May 04 '20 07:05 petermr

So thats setup correctly if you have the environment variable FERRET run: $FERRET --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

and if you have the alias ferret then run ferret --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql

l-hawizy avatar May 04 '20 08:05 l-hawizy

where is medrxiv_search_download.fql ? Is it on the github.com/petermr/openVirus site?

On Mon, May 4, 2020 at 9:17 AM l-hawizy [email protected] wrote:

So thats setup correctly if you have the environment variable FERRET run: $FERRET --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql

and if you have the alias ferret then run ferret --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623324637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZV4LL2C6CSQ7FS6RDRPZ2YJANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar May 04 '20 12:05 petermr

Please can we have a protocol where:

  • all steps are given in a single place
  • all data is in named files , not text files on wiki

i.e. we can hand a single URL to a newcomer that gives

  • all the instructions,
  • try to cover all likely platforms,
  • tests that the installation has worked
  • simple example scripts or datasets
  • expected behaviour

Then I'll be happy to act as alpha tester :-)

On Mon, May 4, 2020 at 1:13 PM Peter Murray-Rust < [email protected]> wrote:

where is medrxiv_search_download.fql ? Is it on the github.com/petermr/openVirus site?

On Mon, May 4, 2020 at 9:17 AM l-hawizy [email protected] wrote:

So thats setup correctly if you have the environment variable FERRET run: $FERRET --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql

and if you have the alias ferret then run ferret --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623324637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZV4LL2C6CSQ7FS6RDRPZ2YJANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar May 04 '20 12:05 petermr

Cannot run ferret or python wrapper.

Running from: https://github.com/petermr/openVirus/tree/ferret/ferret

environment

pm286macbook:ferret pm286$ git branch
* ferret
  master
pm286macbook:ferret pm286$ pwd
/Users/pm286/projects/openVirus/ferret
pm286macbook:ferret pm286$ ls
README.md			get_data_biorxiv.fql		medrxiv_search_download.fql	scrape.py			search_biorxiv.py
ferret.log			get_data_springer.fql		redalyc.fql			search.fql			search_download_medrxiv.py
pm286macbook:ferret pm286$ python --version
Python 2.7.16

python wrapper

pm286macbook:ferret pm286$ python search_download_medrxiv.py "n95 masks" testn95
Running file medrxiv_search_download.fql downloading files to testn95
Traceback (most recent call last):
  File "search_download_medrxiv.py", line 33, in <module>
    subprocess.check_output(cmd, shell=True)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 223, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'ferret --param=url:\"https://www.medrxiv.org/search/n95%20masks\"  --param=dir:\"testn95\" medrxiv_search_download.fql' returned non-zero exit status 1

running raw ferret

pm286macbook:ferret pm286$ ferret --param=url:"https://www.medrxiv.org/search/n95%252Bmasks" --param=dir:"n95" medrxiv_search_download.fql
https://www.medrxiv.org/search/n95%252Bmasks
invalid character 'h' looking for beginning of value
pm286macbook:ferret pm286$ 

petermr avatar May 12 '20 13:05 petermr

@petermr this should work

ferret --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\"" medrixv_search_download.fql

I wrapped https://www.medrxiv.org/search/n95%252Bmasks and n95 in quotation marks. Ferret takes parameters without quotes as a numbers.

3timeslazy avatar May 12 '20 14:05 3timeslazy

Thanks where is main.go?

On Tue, May 12, 2020 at 3:26 PM Vladimir Fetisov [email protected] wrote:

@petermr https://github.com/petermr this should work

go run main.go --param=url:""https://www.medrxiv.org/search/n95%252Bmasks"" --param=dir:""n95"" medrixv_search_download.fql

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627379652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5EKATXKKVHFHZCIH3RRFMB3ANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar May 12 '20 15:05 petermr

I get


pm286macbook:ferret pm286$ ls

README.md get_data_biorxiv.fql medrxiv_search_download.fql scrape.py
search_biorxiv.py

ferret.log get_data_springer.fql redalyc.fql search.fql
search_download_medrxiv.py

pm286macbook:ferret pm286$ go run main.go --param=url:"\"
https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\""
medrixv_search_download.fql

stat main.go: no such file or directory

pm286macbook:ferret pm286$

On Tue, May 12, 2020 at 4:07 PM Peter Murray-Rust < [email protected]> wrote:

Thanks where is main.go?

On Tue, May 12, 2020 at 3:26 PM Vladimir Fetisov [email protected] wrote:

@petermr https://github.com/petermr this should work

go run main.go --param=url:""https://www.medrxiv.org/search/n95%252Bmasks"" --param=dir:""n95"" medrixv_search_download.fql

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627379652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5EKATXKKVHFHZCIH3RRFMB3ANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar May 12 '20 15:05 petermr

I edited my comment. Replace main.go with ferret.

I ran ferret from the source code and forgot to fix the command.

3timeslazy avatar May 12 '20 15:05 3timeslazy


ferret --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\""
--param=dir:"\"n95\"" medrxiv_search_download.fql

Failed to execute the query

initialize driver: failed to initialize driver: could not resolve IP for
127.0.0.1: DOCUMENT(baseUrl+"/search/vaccine",{driver:"cdp"}) at 2:13

pm286macbook:ferret pm286$

On Tue, May 12, 2020 at 4:18 PM Vladimir Fetisov [email protected] wrote:

I edited my comment. Replace main.go with ferret.

I ran ferret from the source code and forgot to fix the command.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627411304, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS56HRD2WWCR4TDZKT3RRFSFDANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar May 12 '20 17:05 petermr

To use the cdp driver, you need to run Google Chrome before the ferret. On macOS it looks like:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

The need to launch Google Chrome in front of the ferret is the main reason for the creation of worker.

3timeslazy avatar May 12 '20 18:05 3timeslazy

Thanks, will try.

On Tue, May 12, 2020 at 7:09 PM Vladimir Fetisov [email protected] wrote:

To use the cdp driver, you need to run Google Chrome before the ferret. On macOS it looks like:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

The need to launch Google Chrome in front of the ferret is the main reason for the creation of worker https://github.com/MontFerret/worker.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627505326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZSSV3WA54FOVUURNLRRGGERANCNFSM4MYD5PWQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

petermr avatar May 12 '20 21:05 petermr

OK, still not quite there:

pm286macbook:ferret pm286$ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

This brought up a new window which I left open

Opening in existing browser session.
pm286macbook:ferret pm286$ ferret --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\"" medrxiv_search_download.fql
Failed to execute the query
initialize driver: failed to initialize driver: could not resolve IP for 127.0.0.1: DOCUMENT(baseUrl+"/search/vaccine",{driver:"cdp"}) at 2:13
pm286macbook:ferret pm286$ 

petermr avatar May 12 '20 22:05 petermr