twitterscraper icon indicating copy to clipboard operation
twitterscraper copied to clipboard

Scraping twitter threads?

Open shorouq-z opened this issue 5 years ago • 2 comments

Is there a way to scrape entire tweet threads instead of individual tweets? Twitter is now also rolling out new features for people to link their tweets in threads even if not tweeting them directly in a sequence. Is there any information recorded in the output file that could point to tweets being in the same thread, or even better, a way to directly scrape entire twitter thread if a certain keyword occurs anywhere within the tweets that make up that thread?

shorouq-z avatar Feb 21 '20 09:02 shorouq-z

I'm having same problem.

In the current implementation, we can't get thread. Now, We have no choice but to do the following(TERRIBLE way):

  1. Get tweets and get list of twitterscraper.tweet.Tweet object (hereinafter referred to as Tweet object).
  2. Extract 1 Tweet object from list.
  3. Extract parent's tweet id from Tweet object by its attribute Tweet.parent_tweet_id.
  4. Search parent tweet by tweet id and combine child tweet.
  5. repeat 1-4, finally we can get twitter thread.

By above procedure, of course, threads that contain tweets by protected account can NOT be extracted. However, this problem is due to Twitter's specs, so we have no choice.

nukopy avatar Feb 21 '20 11:02 nukopy

Hey, thanks for the reply!

I'm toying around with the code at the moment to see if that is also applicable for extracting consecutive tweets in a thread that are written by the same author (sounds ultra-specific but I'm interested in doing analysis on documents that are comprised longer blocks of text written by the same person). So if a person writes a tweet thread rant spanning from 1 to n tweets, then someone replies and the original poster replies back (and so on), then I'm only looking for those 1-n tweets and the rest of the thread (1) gets too complicated to extract blocks of comments from, and (d) probably does not interest me within the scope of this research.

shorouq-z avatar Feb 22 '20 14:02 shorouq-z