rclc icon indicating copy to clipboard operation
rclc copied to clipboard

missing requirements

Open ottowg opened this issue 6 years ago • 3 comments

beautifulsoup4 is missing in requirements. pdfminer is missing in requirements. requests-html is missing in requirements. ray is missing in requirements

ottowg avatar Jan 08 '20 10:01 ottowg

Thank you -- BS4 was missing.

The others were there in requirements.txt:

  • pdfminer.six
  • ray
  • requests-html

But were there any problems using those three libraries?

ceteri avatar Jan 08 '20 11:01 ceteri

I had issues with python 3.8 on Ubuntu to install all correctly (in conda env). So I thought the 3 missing packages are not in the requirements.txt With python 3.7 it seems to work fine.

I was able to download 1379 of the 1662 pdfs. Is this a comparable result?

Thanks a lot for your help

Wolf

POLLUX – Informationsdienst Politikwissenschafthttps://www.pollux-fid.de/about

GESIS - Leibniz-Institut für Sozialwissenschaften Unter Sachsenhausen 6-8 50667 Köln Tel: 0221 47694-543 Mail: [email protected]mailto:[email protected]

Von: Paco Nathan [mailto:[email protected]] Gesendet: Mittwoch, 8. Januar 2020 12:34 An: Coleridge-Initiative/rclc Cc: Otto, Wolfgang; Author Betreff: Re: [Coleridge-Initiative/rclc] missing requirements (#15)

Thank you -- BS4 was missing.

The others were there in requirements.txt:

  • pdfminer.six
  • ray
  • requests-html

But were there any problems using those three libraries?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Coleridge-Initiative/rclc/issues/15?email_source=notifications&email_token=AFH33P6ZMPUELW3M2BWLBF3Q4W23LA5CNFSM4KEF46OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIMC6SI#issuecomment-572010313, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFH33P3WZKSSUM33GGC2QG3Q4W23LANCNFSM4KEF46OA.

ottowg avatar Jan 08 '20 15:01 ottowg

Thank you @ottowg this is super-helpful to know about Py 3.8 errors on Ubuntu.

I was able to download 1379 of the 1662 pdfs. Is this a comparable result?

Yes, that's the number that we saw for the PDF downloads without errors. There's a task in progress to troubleshoot the download process: #6

FWIW, we're running on Ubuntu on our cloud instances, although generally with Py 3.6. We'll try to troubleshoot further.

ceteri avatar Jan 13 '20 04:01 ceteri