Hands-on-WebScraping icon indicating copy to clipboard operation
Hands-on-WebScraping copied to clipboard

HTTP Status Code Is Not Handled Or Not Allowed

Open Huntley30 opened this issue 3 years ago • 1 comments

Uh oh...did Twitter break us? Do we have the change the user_agent in settings.py?

<021-09-09 15:34:55 [scrapy.core.engine] INFO: Spider opened 2021-09-09 15:34:55 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2021-09-09 15:34:55 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2021-09-09 15:34:55 [root] INFO: 3 hashtags found 2021-09-09 15:34:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://mobile.twitter.com/hashtag/cats>: HTTP status code is not handled or not allowed 2021-09-09 15:34:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://mobile.twitter.com/hashtag/dogs>: HTTP status code is not handled or not allowed 2021-09-09 15:34:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://mobile.twitter.com/hashtag/hello>: HTTP status code is not handled or not allowed 2021-09-09 15:34:55 [scrapy.core.engine] INFO: Closing spider (finished)>

Huntley30 avatar Sep 09 '21 19:09 Huntley30

I was able to semi-fix this by updating the USER_AGENT to on line 17 in hashtag.py to 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0', as suggested by this StackOverflow post.

The issue remains, however, that no tweets are found, which appears to be incorrect.

michael-pagan avatar Sep 29 '21 02:09 michael-pagan