snscrape
snscrape copied to clipboard
All Twitter scrapes are failing: `blocked (404)`
With the exception of twitter-trends, all Twitter scrapes are failing since sometime in the past hour. This is likely connected to Twitter as a whole getting locked behind a login wall since earlier today. There is no known workaround at this time, and it's not known whether this will be fixable.
So sad :-( My research project is strongly related to this lib, and pay tribute to your effort in maintaining this.
Twitter disabled their public web site today (2023-06-30) and require users to login, twitter used to be public prior to this date. Would it be possible to automate the login as well providing a username and pw to snscrape, i.e. before calling a graphql api to login to twitter and simulate a logged-in session?
I do not think the developer would do this, as he said that auth would never be added into features: see #270 . Let's see what our great developers' solution, hope it would not take long.
Before using this library, I had started doing manual scrapping myself using Puppeteer and I had automated the sign in part (even through 2FA). The issue is that if you frequently sign in in a small period of time you get blocked by Twitter and you cannot sign in again for a certain amount of time. So I'm not sure what the ideal setup would be in this case...
If this comment is off-topic, please consider deleting it. Uh. It was mentioning Twitter failing in this regard, not you. btw.
Please consider deleting my prior off-topic comment.
Don't nuke this one as off-topic: A Twitter employee says it's temporary:
https://twitter.com/AqueelMiq/status/1674843555486134272 "this is a temporary restriction, we will re-enable logged out twitter access in the near future"
Elon talked about it too 💀 https://twitter.com/elonmusk/status/1674942336583757825
can i use my personal oauth key to twitter snscrape ?
Elon talked about it too 💀 https://twitter.com/elonmusk/status/1674942336583757825
Musk referred to EXTREME scraping, indicating that scrapers may no longer be functional post changes. Let's see how it is done.
can i edited the "twitter.py" modules w/ my own bearer key or event oauth login key? (locally, at my computer when i installed snscraper module) since it change to my local snscraper module ? thanks
Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web/*/https://twitter.com/tesla/status*)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben
Hello,
This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505
Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)
If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.
Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated
Ben
URL: https://cdn.syndication.twimg.com/tweet-result
CODE:
import requests
url = "https://cdn.syndication.twimg.com/tweet-result"
querystring = {"id":"1652193613223436289","lang":"en"}
payload = ""
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Origin": "https://platform.twitter.com",
"Connection": "keep-alive",
"Referer": "https://platform.twitter.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"TE": "trailers"
}
response = requests.request("GET", url, data=payload, headers=headers, params=querystring)
print(response.text)
Generated by Insomnia
Hello, This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505 Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status) If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure. Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated Ben
URL: https://cdn.syndication.twimg.com/tweet-result
CODE:
import requests url = "https://cdn.syndication.twimg.com/tweet-result" querystring = {"id":"1652193613223436289","lang":"en"} payload = "" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate, br", "Origin": "https://platform.twitter.com", "Connection": "keep-alive", "Referer": "https://platform.twitter.com/", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "cross-site", "Pragma": "no-cache", "Cache-Control": "no-cache", "TE": "trailers" } response = requests.request("GET", url, data=payload, headers=headers, params=querystring) print(response.text)Generated by Insomnia
This seems to be working, the problem might be the rate limit and stability, more tests are needed.
It does not allow you to see all the followed by a user either, would there be a solution for that? they help me?
https://twitter.com/elonmusk/status/1675187969420828672
😂
@elonmusk To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:
- Verified accounts are limited to reading 6000 posts/day
- Unverified accounts to 600 posts/day
- New unverified accounts to 300/day
My IP was banned although I was using a proxy that change the IP dynamically, what options we have now?
@JustAnotherArchivist Are the scrapers working anytime soon? Also, I want to thank you for your hard work on these scrapers.
Scraping seems to be still possible, check this:
https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=html
https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=json
By https://github.com/RSS-Bridge/rss-bridge
Scraping seems to be still possible, check this:
https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=html
https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=json
By https://github.com/RSS-Bridge/rss-bridge
while cool, it's using API V1 and you can't get long tweet
hi guys im new to github and coding but maybe this is helpful
https://twitter.com/iam4x/status/1675194767854956546?s=20
hi guys im new to github and coding but maybe this is helpful
https://twitter.com/iam4x/status/1675194767854956546?s=20
This doesn't work since a long time ago.
what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?
hi guys im new to github and coding but maybe this is helpful https://twitter.com/iam4x/status/1675194767854956546?s=20
This doesn't work since a long time ago.
lol this seems to be working, na never mind, besides it was fun for some minutes, it messes up the rest of the features so no lol after all
what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?
The beauty ofsnscrapeis that it doesn't require authentication, if we're going to have to start using login/auth and tools like Selenium then it should be spun off into another project and not snscrape. Also using any form of auth gives twitter another way to ban mass collection which is the use case for many users of snscrape.
Hello, This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505 Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status) If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure. Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated Ben
URL: https://cdn.syndication.twimg.com/tweet-result
CODE:
import requests url = "https://cdn.syndication.twimg.com/tweet-result" querystring = {"id":"1652193613223436289","lang":"en"} payload = "" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate, br", "Origin": "https://platform.twitter.com", "Connection": "keep-alive", "Referer": "https://platform.twitter.com/", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "cross-site", "Pragma": "no-cache", "Cache-Control": "no-cache", "TE": "trailers" } response = requests.request("GET", url, data=payload, headers=headers, params=querystring) print(response.text)Generated by Insomnia
Hi! :)) It works great! Is there perhaps any way to scrape repost and comment data as well? I need a mapping of twitt spread for my master thesis, but what companies are doing lately with their API (like Twitter or Reddit) is terrible....
Hello, This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505 Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status) If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure. Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated Ben
URL: https://cdn.syndication.twimg.com/tweet-result CODE:
import requests url = "https://cdn.syndication.twimg.com/tweet-result" querystring = {"id":"1652193613223436289","lang":"en"} payload = "" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate, br", "Origin": "https://platform.twitter.com", "Connection": "keep-alive", "Referer": "https://platform.twitter.com/", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "cross-site", "Pragma": "no-cache", "Cache-Control": "no-cache", "TE": "trailers" } response = requests.request("GET", url, data=payload, headers=headers, params=querystring) print(response.text)Generated by Insomnia
Hi! :)) It works great! Is there perhaps any way to scrape repost and comment data as well? I need a mapping of twitt spread for my master thesis, but what companies are doing lately with their API (like Twitter or Reddit) is terrible....
You are describing my situation now I need the comments for the same purpose please let me know when you find a solution my submission in September
what about using Selenium first to make a login after that use Sntwitter to get tweets? the question here is how can link between Selenium session with Sntwitter?
The beauty of
snscrapeis that it doesn't require authentication, if we're going to have to start using login/auth and tools like Selenium then it should be spun off into another project and not snscrape. Also using any form of auth gives twitter another way to ban mass collection which is the use case for many users of snscrape.
So you would rather have it completely stop working for all other use cases as well?
@IrtzaShahan #270
Would be great if snscrape would add a new function like TwitterProfileScraperSyn that grabs the tweet data from the still publicly available syndication profile feeds. The sny feed shows 20 tweets with is good for many applications.
Insomnia
Great!
Is there any other param I can put in querystring except the tweet id? I want to get tweets for specific users, but can't find what params should I use.