dump-scraper
dump-scraper copied to clipboard
Error scraping old tweets
**pi@raspberrypi:**~/dump-scraper $ python dumpscraper.py --verbose scrapeold -s 2016-01-01 -u 2016-03-28
[DEBUG] =========================================
[DEBUG] Application Started
[DEBUG] =========================================
Dump Scraper 1.0.0 - A better way of scraping
Copyright (C) 2015-2016 FabbricaBinaria - Davide Tampellini
===============================================================================
Dump Scraper is Free Software, distributed under the terms of the GNU General
Public License version 3 or, at your option, any later version.
This program comes with ABSOLUTELY NO WARRANTY as per sections 15 & 16 of the
license. See http://www.gnu.org/licenses/gpl-3.0.html for details.
===============================================================================
[INFO] Processing day: 2016-03-27
[INFO] Processing day: 2016-03-25
Traceback (most recent call last):
File "dumpscraper.py", line 293, in <module>
scraper.run()
File "dumpscraper.py", line 278, in run
runner.run()
File "/home/pi/dump-scraper/lib/runner/scrapeold.py", line 109, in run
url = origurl + '&scroll_cursor=' + json_data['scroll_cursor']
KeyError: 'scroll_cursor'
**pi@raspberrypi:**~/dump-scraper $
Thank you very much for your report! It seems that Twitter changed its web page, so the old system is no longer invalid. I tried to reverse-engineer the new old, but I'm missing something. That was a very hackish way to fetch very old tweet, older than the API limit of 3500 tweets. I guess I'll have to remove such feature in the next version :(
As a partial workaround, you can specify the date after the target date. The target date's tweets (at least some of them) will be downloaded before the error occurs.
So you could write a shell script to fetch them, one day at a time:
$ for date in `grep 2016-01 dates.list`; do python ./dumpscraper.py scrapeold -s 2016-01-01 -u ${date}; done
This will only grab the first page's worth (before scrolling would happen).