russian-troll-tweets icon indicating copy to clipboard operation
russian-troll-tweets copied to clipboard

Should be available via BitTorrent and as a web database that can be queried

Open scripting opened this issue 6 years ago • 8 comments

First, thanks for making the data available. I was asking about this recently. I would like to get a look at troll tweets, it might help us avoid arguing with them in the future.

However --

I wasn't able to download the file, abd this is not a great way to distribution the info. Better would be:

  1. BitTorrent distribution. It was made for data like this. GitHub, not so much.

  2. And it would be wonderful to have this online as a database that can be queried with SQL commands.

I would be happy to help either or both projects, assuming they don't already exist.

Thanks again for uploading the data.

Dave

scripting avatar Jul 31 '18 16:07 scripting

I just tweeted about your second point [tweet], since I've imported the dataset into BigQuery, which has a free tier (1TB of queries). The dataset is public.

You can query the dataset like so:

SELECT author, content, followers
FROM `optimum-rock-145719.fivethirtyeight_russian_troll_tweets.russian_troll_tweets`
WHERE language = "English"
ORDER BY followers DESC
LIMIT 5

elithrar avatar Jul 31 '18 16:07 elithrar

@elithrar Thank you so much for putting into BigQuery!

chohenry avatar Jul 31 '18 18:07 chohenry

I had not used BigQuery before. Here's the link to the query you ran.

https://console.cloud.google.com/bigquery?_ga=2.22451449.-337486084.1533083301&pli=1&project=nimble-gearing-94719&folder&organizationId&j=bquxjob_c3fe756_164f2e2d821&page=queryresults

scripting avatar Aug 01 '18 00:08 scripting

I’m going to put a blog post up in the next day or so that walks through how to use BigQuery to explore this dataset, including how to make the most of the free tier with good query habits.

Will link back here! On Tue, Jul 31, 2018 at 5:30 PM Dave Winer [email protected] wrote:

I had not used BigQuery before. Here's the link to the query you ran.

https://console.cloud.google.com/bigquery?_ga=2.22451449.-337486084.1533083301&pli=1&project=nimble-gearing-94719&folder&organizationId&j=bquxjob_c3fe756_164f2e2d821&page=queryresults

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fivethirtyeight/russian-troll-tweets/issues/3#issuecomment-409410355, or mute the thread https://github.com/notifications/unsubscribe-auth/AABIcBB_LRs4JxDka7c4CB6fqG_t2qjYks5uMPaigaJpZM4VolUF .

elithrar avatar Aug 01 '18 02:08 elithrar

I put the tweets online here, with a search interface:

http://24ahead.com/influence-tweets

24AheadDotCom avatar Aug 01 '18 18:08 24AheadDotCom

tweets can be queried here too: http://www.fromrussiawithtroll.com/

fabioporta avatar Aug 04 '18 16:08 fabioporta

Better late than never: I've posted a guide to querying my hosted dataset using BigQuery - https://blog.questionable.services/article/diving-into-fivethirtyeight-troll-tweets-bigquery/

e.g.

SELECT
  author,
  COUNT(*) AS count,
  FORMAT("%.2f", COUNT(*) / (
    SELECT
      COUNT(*)
    FROM
      `optimum-rock-145719.fivethirtyeight_russian_troll_tweets.russian_troll_tweets`) * 100) AS percent
FROM
  `optimum-rock-145719.fivethirtyeight_russian_troll_tweets.russian_troll_tweets`
GROUP BY
  author
ORDER BY
  percent DESC
LIMIT
  10

elithrar avatar Aug 19 '18 06:08 elithrar

We've also put together a tool for querying the tweets online: https://russiatweets.com

chrisgherbert avatar Sep 27 '18 05:09 chrisgherbert