Pubmed-Batch-Download failed to fetch

failed to fetch

Open antoine4ucsd opened this issue 5 years ago • 7 comments

Hello I just installed the 2 required packages and tried to fetch a couple of refs (using either my PMID or the example_pmf.tsv) but I get the following errors: Any suggestions? thanks!

$ python fetch_pdfs.py -pmf example_pmf.tsv ~/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. utils.DeprecatedIn23, Trying to fetch pmid 27547345 ** fetching of reprint 27547345 failed from error 'NoneType' object has no attribute 'readline'

May 02 '19 23:05 antoine4ucsd

I finally install it. It works for some PMIDs but sometimes, it does not work (empty pdfs) although it says that it worked. Thoughts? thank you again. python fetch_pdfs.py -pmids 30689769 Trying to fetch pmid 30689769 Trying genericCitationLabelled ** fetching reprint using the 'generic citation labelled' finder... ** fetching of reprint 30689769 succeeded

May 03 '19 02:05 antoine4ucsd

I just fixed another issue that may have been underlying this issue. I will push the update tomorrow-ish, at which point you can try it and let me know if it works. It will require a new python install since I needed to migrate to python3

Jun 12 '19 18:06 billgreenwald

thanks! a

On Jun 12, 2019, at 8:43 PM, Bill Greenwald <[email protected] mailto:[email protected]> wrote:

I just fixed another issue that may have been underlying this issue. I will push the update tomorrow-ish, at which point you can try it and let me know if it works. It will require a new python install since I needed to migrate to python3

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/billgreenwald/Pubmed-Batch-Download/issues/13?email_source=notifications&email_token=AENFHZ5BZVDMUFRWDE2MOR3P2E7VBA5CNFSM4HKMSV7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXRNKVY#issuecomment-501405015, or mute the thread https://github.com/notifications/unsubscribe-auth/AENFHZ45QJEL72CXQSKBJY3P2E7VBANCNFSM4HKMSV7A.

Jun 12 '19 18:06 antoine4ucsd

the new version is live -- let me know if it works. You will need to install the new version of python, but I updated the .yml so you can use it to make a new conda environment easily. Let me know if it works or if you have other questions.

Jun 13 '19 18:06 billgreenwald

If I understand this error properly, it happens on a subset of my articles too (around 25% from different journals). The PDF is relatively small, and cannot be opened with any readers. Other such IDs are 10839994, 8146161 etc.

This happens usually when the journal website replies with other data than PDF. Check out the returned file at my google drive: https://drive.google.com/file/d/1FNHuSSE1ndmeNotYjJS-m7nY4TNoUjc8/view?usp=sharing Even though it is called pdf, it is actually a text file. This is ok, as it means the user didn't have access anyway, but this seems to be what is confusing the output.

Jun 14 '19 15:06 OgnjenMilicevic

FYI, having trouble with the new yml file (running on a Mac).

conda env create -f pubmed-batch-downloader-py3.yml

Collecting package metadata: done Solving environment: failed

ResolvePackageNotFound:

openssl==1.1.1b=h14c3975_1

tk==8.6.9=hed695b0_1002

python==3.7.3=h5b0a415_0

libstdcxx-ng==9.1.0=hdf63c60_0

libffi==3.2.1=he1b5a44_1006

cryptography==2.7=py37h72c5cf5_0

sqlite==3.28.0=h8b20d00_0

xz==5.2.4=h14c3975_1001

cffi==1.12.3=py37h8022711_0

zlib==1.2.11=h14c3975_1004

readline==7.0=hf8c457e_1001

ncurses==6.1=hf484d3e_1002

bzip2==1.0.6=h14c3975_1002

libgcc-ng==9.1.0=hdf63c60_0

Jun 17 '19 23:06 antoine4ucsd

Hey everyone, sorry for the long delay in replying to this issue.

@OgnjenMilicevic you are correct that the issue is non pdf data being saved as a pdf. Do you know of a way to validate the data coming in to see if it is a pdf? I will explore this too -- right now I simply assume it will be a pdf by saving the bits coming through to a file with a .pdf extension by force.

@antoine4ucsd I am currently working on the python environment, will come back to you soon

Oct 03 '19 15:10 billgreenwald

Pubmed-Batch-Download Pubmed-Batch-Download copied to clipboard

failed to fetch

Pubmed-Batch-Download
Pubmed-Batch-Download copied to clipboard