dump-scraper icon indicating copy to clipboard operation
dump-scraper copied to clipboard

Error scraping old tweets

Open ghost opened this issue 8 years ago • 2 comments

**pi@raspberrypi:**~/dump-scraper $ python dumpscraper.py --verbose scrapeold -s 2016-01-01 -u 2016-03-28 
[DEBUG] =========================================
[DEBUG] Application Started
[DEBUG] =========================================
Dump Scraper 1.0.0 - A better way of scraping
Copyright (C) 2015-2016 FabbricaBinaria - Davide Tampellini
===============================================================================
Dump Scraper is Free Software, distributed under the terms of the GNU General
Public License version 3 or, at your option, any later version.
This program comes with ABSOLUTELY NO WARRANTY as per sections 15 & 16 of the
license. See http://www.gnu.org/licenses/gpl-3.0.html for details.
===============================================================================
[INFO] Processing day: 2016-03-27
[INFO] Processing day: 2016-03-25
Traceback (most recent call last):
  File "dumpscraper.py", line 293, in <module>
    scraper.run()
  File "dumpscraper.py", line 278, in run
    runner.run()
  File "/home/pi/dump-scraper/lib/runner/scrapeold.py", line 109, in run
    url = origurl + '&scroll_cursor=' + json_data['scroll_cursor']
KeyError: 'scroll_cursor'
**pi@raspberrypi:**~/dump-scraper $ 

ghost avatar Apr 08 '16 22:04 ghost

Thank you very much for your report! It seems that Twitter changed its web page, so the old system is no longer invalid. I tried to reverse-engineer the new old, but I'm missing something. That was a very hackish way to fetch very old tweet, older than the API limit of 3500 tweets. I guess I'll have to remove such feature in the next version :(

tampe125 avatar Apr 18 '16 06:04 tampe125

As a partial workaround, you can specify the date after the target date. The target date's tweets (at least some of them) will be downloaded before the error occurs.

So you could write a shell script to fetch them, one day at a time:

$ for date in `grep 2016-01 dates.list`; do python ./dumpscraper.py scrapeold -s 2016-01-01 -u ${date}; done

This will only grab the first page's worth (before scrolling would happen).

roycewilliams avatar Aug 01 '16 05:08 roycewilliams