twitterscraper
twitterscraper copied to clipboard
Scraping twitter threads?
Is there a way to scrape entire tweet threads instead of individual tweets? Twitter is now also rolling out new features for people to link their tweets in threads even if not tweeting them directly in a sequence. Is there any information recorded in the output file that could point to tweets being in the same thread, or even better, a way to directly scrape entire twitter thread if a certain keyword occurs anywhere within the tweets that make up that thread?
I'm having same problem.
In the current implementation, we can't get thread. Now, We have no choice but to do the following(TERRIBLE way):
- Get tweets and get list of
twitterscraper.tweet.Tweet
object (hereinafter referred to asTweet
object). - Extract 1
Tweet
object from list. - Extract parent's tweet id from
Tweet
object by its attributeTweet.parent_tweet_id
. - Search parent tweet by tweet id and combine child tweet.
- repeat 1-4, finally we can get twitter thread.
By above procedure, of course, threads that contain tweets by protected account can NOT be extracted. However, this problem is due to Twitter's specs, so we have no choice.
Hey, thanks for the reply!
I'm toying around with the code at the moment to see if that is also applicable for extracting consecutive tweets in a thread that are written by the same author (sounds ultra-specific but I'm interested in doing analysis on documents that are comprised longer blocks of text written by the same person). So if a person writes a tweet thread rant spanning from 1 to n tweets, then someone replies and the original poster replies back (and so on), then I'm only looking for those 1-n tweets and the rest of the thread (1) gets too complicated to extract blocks of comments from, and (d) probably does not interest me within the scope of this research.