pygetpapers
pygetpapers copied to clipboard
Unable to download Huge corpus of papers
Describe the bug
Was downloading XML and CSV files for all the papers published in the year 2021 for the query "Transcription factors", the limit was set to 100k papers, and hits were 99k, ideally, it should start the download with a warning but the error is
TypeError: 'NoneType' object is not subscriptable
To Reproduce Steps to reproduce the behaviour:
- In your windows command prompt type
pygetpapers -q "Transcription factors" -x -c -o TF_database_2021 -k 100000 --startdate 2021-01-01 --enddate 2021-12-31
- press 'Enter'
- Scroll down to the end
- See an error like
TypeError: 'NoneType' object is not subscriptable
Expected behaviour
Ideally, it should start the download of all the available XML and CSV files related to the query
Screenshots
Desktop (please complete the following information):
- OS: Windows 11
- Browser : Firefox
- Version : Firefox 95.0
Additional context it usually works for a small corpus of like 1000 to 100 papers, for example, pygetpapers ran smoothly the above query for the year 2022 and set the limit to 1000 papers, but the actual hits were only 458. it downloaded a corpus of 458 papers with CSV and XML files. But for a huge corpus usually >1k, it shows the above error message.
Can you check the same command in version 1.1.5
Thanks both, I suggest that 100K is too large a chunk. Maybe 10K
- it may put strain on the server and get blocked
- when errors occur it may be difficult to locate the documents responsible
- as we have here
- make sure you can actually analyze the downloaded material. If you can't process 10K, downloading 100K won't gain anything.
On Wed, Feb 23, 2022 at 3:46 PM Ayush Garg @.***> wrote:
Can you check the same command in version 1.1.5
— Reply to this email directly, view it on GitHub https://github.com/petermr/pygetpapers/issues/31#issuecomment-1048922403, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4VBAN5LNVD5WFDQNDU4T6N3ANCNFSM5LYO644A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK