Hands-on-WebScraping
Hands-on-WebScraping copied to clipboard
HTTP Status Code Is Not Handled Or Not Allowed
Uh oh...did Twitter break us? Do we have the change the user_agent in settings.py?
<021-09-09 15:34:55 [scrapy.core.engine] INFO: Spider opened 2021-09-09 15:34:55 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2021-09-09 15:34:55 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2021-09-09 15:34:55 [root] INFO: 3 hashtags found 2021-09-09 15:34:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://mobile.twitter.com/hashtag/cats>: HTTP status code is not handled or not allowed 2021-09-09 15:34:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://mobile.twitter.com/hashtag/dogs>: HTTP status code is not handled or not allowed 2021-09-09 15:34:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://mobile.twitter.com/hashtag/hello>: HTTP status code is not handled or not allowed 2021-09-09 15:34:55 [scrapy.core.engine] INFO: Closing spider (finished)>
I was able to semi-fix this by updating the USER_AGENT
to on line 17 in hashtag.py to 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0', as suggested by this StackOverflow post.
The issue remains, however, that no tweets are found, which appears to be incorrect.