Pubmed-Batch-Download icon indicating copy to clipboard operation
Pubmed-Batch-Download copied to clipboard

failed to fetch

Open antoine4ucsd opened this issue 5 years ago • 7 comments

Hello I just installed the 2 required packages and tried to fetch a couple of refs (using either my PMID or the example_pmf.tsv) but I get the following errors: Any suggestions? thanks!

$ python fetch_pdfs.py -pmf example_pmf.tsv ~/anaconda2/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. utils.DeprecatedIn23, Trying to fetch pmid 27547345 ** fetching of reprint 27547345 failed from error 'NoneType' object has no attribute 'readline'

antoine4ucsd avatar May 02 '19 23:05 antoine4ucsd

I finally install it. It works for some PMIDs but sometimes, it does not work (empty pdfs) although it says that it worked. Thoughts? thank you again. python fetch_pdfs.py -pmids 30689769 Trying to fetch pmid 30689769 Trying genericCitationLabelled ** fetching reprint using the 'generic citation labelled' finder... ** fetching of reprint 30689769 succeeded

antoine4ucsd avatar May 03 '19 02:05 antoine4ucsd

I just fixed another issue that may have been underlying this issue. I will push the update tomorrow-ish, at which point you can try it and let me know if it works. It will require a new python install since I needed to migrate to python3

billgreenwald avatar Jun 12 '19 18:06 billgreenwald

thanks! a

On Jun 12, 2019, at 8:43 PM, Bill Greenwald <[email protected] mailto:[email protected]> wrote:

I just fixed another issue that may have been underlying this issue. I will push the update tomorrow-ish, at which point you can try it and let me know if it works. It will require a new python install since I needed to migrate to python3

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/billgreenwald/Pubmed-Batch-Download/issues/13?email_source=notifications&email_token=AENFHZ5BZVDMUFRWDE2MOR3P2E7VBA5CNFSM4HKMSV7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXRNKVY#issuecomment-501405015, or mute the thread https://github.com/notifications/unsubscribe-auth/AENFHZ45QJEL72CXQSKBJY3P2E7VBANCNFSM4HKMSV7A.

antoine4ucsd avatar Jun 12 '19 18:06 antoine4ucsd

the new version is live -- let me know if it works. You will need to install the new version of python, but I updated the .yml so you can use it to make a new conda environment easily. Let me know if it works or if you have other questions.

billgreenwald avatar Jun 13 '19 18:06 billgreenwald

If I understand this error properly, it happens on a subset of my articles too (around 25% from different journals). The PDF is relatively small, and cannot be opened with any readers. Other such IDs are 10839994, 8146161 etc.

This happens usually when the journal website replies with other data than PDF. Check out the returned file at my google drive: https://drive.google.com/file/d/1FNHuSSE1ndmeNotYjJS-m7nY4TNoUjc8/view?usp=sharing Even though it is called pdf, it is actually a text file. This is ok, as it means the user didn't have access anyway, but this seems to be what is confusing the output.

OgnjenMilicevic avatar Jun 14 '19 15:06 OgnjenMilicevic

FYI, having trouble with the new yml file (running on a Mac).

conda env create -f pubmed-batch-downloader-py3.yml

Collecting package metadata: done Solving environment: failed

ResolvePackageNotFound:

  • openssl==1.1.1b=h14c3975_1
  • tk==8.6.9=hed695b0_1002
  • python==3.7.3=h5b0a415_0
  • libstdcxx-ng==9.1.0=hdf63c60_0
  • libffi==3.2.1=he1b5a44_1006
  • cryptography==2.7=py37h72c5cf5_0
  • sqlite==3.28.0=h8b20d00_0
  • xz==5.2.4=h14c3975_1001
  • cffi==1.12.3=py37h8022711_0
  • zlib==1.2.11=h14c3975_1004
  • readline==7.0=hf8c457e_1001
  • ncurses==6.1=hf484d3e_1002
  • bzip2==1.0.6=h14c3975_1002
  • libgcc-ng==9.1.0=hdf63c60_0

antoine4ucsd avatar Jun 17 '19 23:06 antoine4ucsd

Hey everyone, sorry for the long delay in replying to this issue.

@OgnjenMilicevic you are correct that the issue is non pdf data being saved as a pdf. Do you know of a way to validate the data coming in to see if it is a pdf? I will explore this too -- right now I simply assume it will be a pdf by saving the bits coming through to a file with a .pdf extension by force.

@antoine4ucsd I am currently working on the python environment, will come back to you soon

billgreenwald avatar Oct 03 '19 15:10 billgreenwald