openVirus
openVirus copied to clipboard
Testing medrxiv python/ferret downloader
Created a file with search_download_medrxiv.py
Problems running with both python3
pm286macbook:ferret pm286$ python3 search_download_medrxiv.py "n95" n95
Traceback (most recent call last):
File "search_download_medrxiv.py", line 24, in <module>
query=urllib.quote(sys.argv[1])
AttributeError: module 'urllib' has no attribute 'quote'
and python2
pm286macbook:ferret pm286$ python search_download_medrxiv.py "n95" n95
Running file medrxiv_search_download.fql downloading files to n95
Traceback (most recent call last):
File "search_download_medrxiv.py", line 28, in <module>
cmd = ferret + " --param=url:\\\""+query_url+"\\\" --param=dir:\\\""+output_folder+"\\\" " + fql_file
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
pm286macbook:ferret pm286$
I should have added a note that it is in python2 because I assumed thats the default installation on most machines. I can switch it to python3 if its easier The error occurred because ferret wasn't installed. I've put a more descriptive fail in. You would need to follow these steps to install it on MacOS https://github.com/petermr/openVirus/wiki/Ferret
Thanks, I will be intelligent but casual so I pick up undefined operations. P.
On Sun, May 3, 2020 at 3:19 PM l-hawizy [email protected] wrote:
I should have added a note that it is in python2 because I assumed thats the default installation on most machines. I can switch it to python3 if its easier The error occurred because ferret wasn't installed. I've put a more descriptive fail in. You would need to follow these steps to install it on MacOS https://github.com/petermr/openVirus/wiki/Ferret
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623117148, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSY74D33PTJXWK5GS3LRPV4PTANCNFSM4MYD5PWQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I think I already had ferret installed:
Welcome to Ferret REPL 0.10.2
Please use `exit` or `Ctrl-D` to exit this program.
>
ah great then all you would need is these two commands:
alias ferret="/your/local/directory/ferret_darwin_x86_64/ferret"
export FERRET=ferret
also pull the latest changes
will try tomorrow when brain is working.
On Sun, May 3, 2020 at 6:28 PM l-hawizy [email protected] wrote:
also pull the latest changes
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623148567, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZNGBWATEZM2CRRX43RPWSSJANCNFSM4MYD5PWQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
or the easier way just run the ferret command
ferret --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql
Thanks, Can you set out all the precise steps needed? So I can restart from scratch. Thanks.
I currently have installed FERRET and get:
pm286macbook:~ pm286$ FERRET
Welcome to Ferret REPL 0.10.2
Please use `exit` or `Ctrl-D` to exit this program.
> ^C
On Mon, May 4, 2020 at 8:00 AM l-hawizy [email protected] wrote:
or the easier way just run the ferret command ferret --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623293301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS3O5LZ4SBW3GE2KIEDRPZRZNANCNFSM4MYD5PWQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
So thats setup correctly
if you have the environment variable FERRET run:
$FERRET --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql
and if you have the alias ferret
then run
ferret --param=url:\"https://www.medrxiv.org/search/n95\" --param=dir:\"n95\" medrxiv_search_download.fql
where is medrxiv_search_download.fql ? Is it on the github.com/petermr/openVirus site?
On Mon, May 4, 2020 at 9:17 AM l-hawizy [email protected] wrote:
So thats setup correctly if you have the environment variable FERRET run: $FERRET --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql
and if you have the alias ferret then run ferret --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623324637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZV4LL2C6CSQ7FS6RDRPZ2YJANCNFSM4MYD5PWQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Please can we have a protocol where:
- all steps are given in a single place
- all data is in named files , not text files on wiki
i.e. we can hand a single URL to a newcomer that gives
- all the instructions,
- try to cover all likely platforms,
- tests that the installation has worked
- simple example scripts or datasets
- expected behaviour
Then I'll be happy to act as alpha tester :-)
On Mon, May 4, 2020 at 1:13 PM Peter Murray-Rust < [email protected]> wrote:
where is medrxiv_search_download.fql ? Is it on the github.com/petermr/openVirus site?
On Mon, May 4, 2020 at 9:17 AM l-hawizy [email protected] wrote:
So thats setup correctly if you have the environment variable FERRET run: $FERRET --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql
and if you have the alias ferret then run ferret --param=url:"https://www.medrxiv.org/search/n95" --param=dir:"n95" medrxiv_search_download.fql
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-623324637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZV4LL2C6CSQ7FS6RDRPZ2YJANCNFSM4MYD5PWQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Cannot run ferret
or python
wrapper.
Running from: https://github.com/petermr/openVirus/tree/ferret/ferret
environment
pm286macbook:ferret pm286$ git branch
* ferret
master
pm286macbook:ferret pm286$ pwd
/Users/pm286/projects/openVirus/ferret
pm286macbook:ferret pm286$ ls
README.md get_data_biorxiv.fql medrxiv_search_download.fql scrape.py search_biorxiv.py
ferret.log get_data_springer.fql redalyc.fql search.fql search_download_medrxiv.py
pm286macbook:ferret pm286$ python --version
Python 2.7.16
python wrapper
pm286macbook:ferret pm286$ python search_download_medrxiv.py "n95 masks" testn95
Running file medrxiv_search_download.fql downloading files to testn95
Traceback (most recent call last):
File "search_download_medrxiv.py", line 33, in <module>
subprocess.check_output(cmd, shell=True)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 223, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'ferret --param=url:\"https://www.medrxiv.org/search/n95%20masks\" --param=dir:\"testn95\" medrxiv_search_download.fql' returned non-zero exit status 1
running raw ferret
pm286macbook:ferret pm286$ ferret --param=url:"https://www.medrxiv.org/search/n95%252Bmasks" --param=dir:"n95" medrxiv_search_download.fql
https://www.medrxiv.org/search/n95%252Bmasks
invalid character 'h' looking for beginning of value
pm286macbook:ferret pm286$
@petermr this should work
ferret --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\"" medrixv_search_download.fql
I wrapped https://www.medrxiv.org/search/n95%252Bmasks
and n95
in quotation marks.
Ferret takes parameters without quotes as a numbers.
Thanks
where is main.go
?
On Tue, May 12, 2020 at 3:26 PM Vladimir Fetisov [email protected] wrote:
@petermr https://github.com/petermr this should work
go run main.go --param=url:""https://www.medrxiv.org/search/n95%252Bmasks"" --param=dir:""n95"" medrixv_search_download.fql
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627379652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5EKATXKKVHFHZCIH3RRFMB3ANCNFSM4MYD5PWQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I get
pm286macbook:ferret pm286$ ls
README.md get_data_biorxiv.fql medrxiv_search_download.fql scrape.py
search_biorxiv.py
ferret.log get_data_springer.fql redalyc.fql search.fql
search_download_medrxiv.py
pm286macbook:ferret pm286$ go run main.go --param=url:"\"
https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\""
medrixv_search_download.fql
stat main.go: no such file or directory
pm286macbook:ferret pm286$
On Tue, May 12, 2020 at 4:07 PM Peter Murray-Rust < [email protected]> wrote:
Thanks where is
main.go
?On Tue, May 12, 2020 at 3:26 PM Vladimir Fetisov [email protected] wrote:
@petermr https://github.com/petermr this should work
go run main.go --param=url:""https://www.medrxiv.org/search/n95%252Bmasks"" --param=dir:""n95"" medrixv_search_download.fql
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627379652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5EKATXKKVHFHZCIH3RRFMB3ANCNFSM4MYD5PWQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I edited my comment. Replace main.go with ferret.
I ran ferret from the source code and forgot to fix the command.
ferret --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\""
--param=dir:"\"n95\"" medrxiv_search_download.fql
Failed to execute the query
initialize driver: failed to initialize driver: could not resolve IP for
127.0.0.1: DOCUMENT(baseUrl+"/search/vaccine",{driver:"cdp"}) at 2:13
pm286macbook:ferret pm286$
On Tue, May 12, 2020 at 4:18 PM Vladimir Fetisov [email protected] wrote:
I edited my comment. Replace main.go with ferret.
I ran ferret from the source code and forgot to fix the command.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627411304, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS56HRD2WWCR4TDZKT3RRFSFDANCNFSM4MYD5PWQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
To use the cdp
driver, you need to run Google Chrome before the ferret.
On macOS it looks like:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
The need to launch Google Chrome in front of the ferret is the main reason for the creation of worker.
Thanks, will try.
On Tue, May 12, 2020 at 7:09 PM Vladimir Fetisov [email protected] wrote:
To use the cdp driver, you need to run Google Chrome before the ferret. On macOS it looks like:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
The need to launch Google Chrome in front of the ferret is the main reason for the creation of worker https://github.com/MontFerret/worker.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/openVirus/issues/51#issuecomment-627505326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSZSSV3WA54FOVUURNLRRGGERANCNFSM4MYD5PWQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
OK, still not quite there:
pm286macbook:ferret pm286$ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
This brought up a new window which I left open
Opening in existing browser session.
pm286macbook:ferret pm286$ ferret --param=url:"\"https://www.medrxiv.org/search/n95%252Bmasks\"" --param=dir:"\"n95\"" medrxiv_search_download.fql
Failed to execute the query
initialize driver: failed to initialize driver: could not resolve IP for 127.0.0.1: DOCUMENT(baseUrl+"/search/vaccine",{driver:"cdp"}) at 2:13
pm286macbook:ferret pm286$