GetOldTweets3
GetOldTweets3 copied to clipboard
HTTP Error, Gives 404 but the URL is working
Hi, I had a script running over the past weeks and earlier today it stopped working. I keep receiving HTTPError 404, but the provided link in the errors still brings me to a valid page.
Code is (all mentioned variables are established and the error specifically happens with the Manager when I check via debugging):
tweetCriteria = got.manager.TweetCriteria().setQuerySearch(term)\ .setMaxTweets(max_count)\ .setSince(begin_timeframe)\ .setUntil(end_timeframe) scraped_tweets = got.manager.TweetManager.getTweets(tweetCriteria)
The error message for this is the standard 404 error "An error occured during an HTTP request: HTTP Error 404: Not Found Try to open in browser:" followed by the valid link
As I have changed nothing about the folder, I am wondering if something has happened with my configurations more so than anything else, but wondering if others are experiencing this.
Hello @sagefuentes, I'm dealing with the exact same issue, I also have been downloading tweets for the past weeks and it suddenly stops working giving me error 404 with a valid link.
I've no idea what might be the cause...
So like me. I also suddenly encounter this problem today, but all things went well yesterday.
I am dealing with the same issue here. This is something new today and is caused by some changes/bugs on Twitter server side. If using the command with debug=True, the URL used to get tweets is no longer available. Seeking for solution now.
Also started having the same issue today.
I'm having the same issue as well! Does anyone have a solution for it?
Yes. I am having the same issue. Guess everyone are having the issue.
I'm not sure if it is related to this issue, but some of the user_agents
seem to be out of date
user_agents = [
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0',
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:62.0) Gecko/20100101 Firefox/62.0',
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0',
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15',
]
same
Seems to be a "bigger" problem? Also other scrappers have problems. https://github.com/twintproject/twint/issues/915#issue-704034135
Here is debug enabled. It shows the actual url being called, and it seems that twitter has removed the /i/search/timeline
endpoint. :(
https://twitter.com/i/search/timeline?vertical=news&q=from%3AREDACTED&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?vertical=news&q=from%3AREDACTED&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=%20from%3AREDACTED&src=typd
Same problem, damn
I'm not sure if it is related to this issue, but some of the
user_agents
seem to be out of date
I forked and created a branch to allow a user-specified UA, using samples from my current browser doesn't fix the problem.
I notice the search and referrer URL shown in--debug
output (https://twitter.com/i/search/timeline
) returns a 404 error:
$ GetOldTweets3 --username twitter --debug
/home/inactivist/.local/bin/GetOldTweets3 --username twitter --debug
GetOldTweets3 0.0.11
Downloading tweets...
https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=%20from%3Atwitter&src=typd
$ curl -I https://twitter.com/i/search/timeline
HTTP/2 404
[snip]
EDIT The url used for the internal search, and the one shown in the exception message, aren't the same...
I tried replacing https://twitter.com/i/search/timeline with https://twitter.com/search?. 404 error is gone but now there is 400 bad request error.
Unfortunately i have same problem, i hope we find a solution as soon as possible.
I tried replacing https://twitter.com/i/search/timeline with https://twitter.com/search?. 404 error is gone but now there is 400 bad request error.
Switching to mobile.twitter.com/search
and using a modern User-Agent header seems to get us past the 400 bad request error, but then we get Error parsing JSON
...
Same thing for me. I get an error 404 but the URL is working.
I have same issue
I am experiencing the same issue. Any plan to fix the issue?
same issue, somebody help.
Same issue. The same code was working a day back now its giving error 404 with a valid link
Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html
I am having the same issue. It was more robust than Tweepy. I hope we find a solution as soon as possible.
Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html
Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api.
I hope getoldtweets start to work as soon as possible, otherwise i can not complete my master thesis
I have same issue. Need some help here
Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html
Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api. I hope getoldtweets start to work as soon as possible, otherwise i can not complete my master thesis
I see! I'm fairly new to scrapping, but I'm working on a end of course thesis about sentiment analysis and could really use some newer tweets to help me out.
I've been tinkering with GOT3's code a bit and got it to read the HTML of the search timeline, however it's mostly unformatted. Like I said, I have little experience with scrapping so I'm really struggling to format it correctly. However, I will note my changes, for reference and for someone with more experience to pick-up if they so wish:
-
updated user_agents (updated with the ones used by TWINT);
-
updated endpoint (/search?)
-
some updates to the URL structure:
url = "https://twitter.com/search?"
url += ("q=%%20%s&src=typd%s"
"&include_available_features=1&include_entities=1&max_position=%s"
"&reset_error_state=false")
if not tweetCriteria.topTweets:
url += "&f=live"`
Edit: Forgot to say this. Sometimes the application gives me a 400: Bad Request, I run it again, and it outputs the HTML like said before.
same issue here, anyone has any idea how to solve this?
same issue here, It stopped working from past three days.
Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html
Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api. I hope getoldtweets start to work as soon as possible, otherwise i can not complete my master thesis
I see! I'm fairly new to scrapping, but I'm working on a end of course thesis about sentiment analysis and could really use some newer tweets to help me out.
I've been tinkering with GOT3's code a bit and got it to read the HTML of the search timeline, however it's mostly unformatted. Like I said, I have little experience with scrapping so I'm really struggling to format it correctly. However, I will note my changes, for reference and for someone with more experience to pick-up if they so wish:
- updated user_agents (updated with the ones used by TWINT);
- updated endpoint (/search?)
- some updates to the URL structure:
url = "https://twitter.com/search?" url += ("q=%%20%s&src=typd%s" "&include_available_features=1&include_entities=1&max_position=%s" "&reset_error_state=false") if not tweetCriteria.topTweets: url += "&f=live"`
Edit: Forgot to say this. Sometimes the application gives me a 400: Bad Request, I run it again, and it outputs the HTML like said before.
The html problem can be easily solved with some BeautifulSoup manipulations. However I can not get the BeautifulSoup functions to work as it constantly gets and error:
Error parsing JSON: <?xml version="1.0" encoding="utf-8"?>
And it does not allow me to continue using the obtained HTML.
Any idea on how to avoid this error?
same issue here, I think this is because twitter has removed the endpoint https://twitter.com/i/search/timeline?
Same issue with me and some of my other colleagues. I noticed other scrapers are also running into this problem like Twint https://github.com/twintproject/twint/issues/918
It seems to follow along the same logic mentioned above that it's likely an update on Twitter's part that is making these scraping libraries not work.