russian-troll-tweets
russian-troll-tweets copied to clipboard
Desperately Seeking Schema
It would be nice if the table in the README could be updated with information about the type of each field. In particular, for those fields that are enumerated constants (such as post_type
and account_type
), list the set of valid values and for all fields indicate whether they are nullable. Since the data format is not raw Twitter data, maybe a link to https://help.salesforce.com/articleView?id=mc_ss_csv_report_headers.htm&type=5 would be helpful, too.
I’m not 100% certain about which fields may be nullable. But here’s a list of enumerated constant field values, for what it’s worth:
- account_category
- Commercial
- Fearmonger
- HashtagGamer
- LeftTroll
- NewsFeed
- NonEnglish
- RightTroll
- Unknown
- account_type (nullable)
- ?
- Arabic
- Commercial
-
Ebola
(contains a trailing space) - French
- German
- Hashtager
- Italian
- Koch
- left
- local
- news
- Portuguese
- right
- Right
- Russian
- Spanish
- Ukranian
- Uzbek
- ZAPOROSHIA
- language (possibly nullable?)
- Albanian
- Arabic
- Bengali
- Bulgarian
- Catalan
- Croatian
- Czech
- Danish
- Dutch
- English
- Estonian
- Farsi (Persian)
- Finnish
- French
- German
- Greek
- Gujarati
- Hebrew
- Hindi
- Hungarian
- Icelandic
- Indonesian
- Italian
- Japanese
- Kannada
- Korean
- Kurdish
- LANGUAGE UNDEFINED
- Latvian
- Lithuanian
- Macedonian
- Malay
- Malayalam
- Norwegian
- Polish
- Portuguese
- Pushto
- Romanian
- Russian
- Serbian
- Simplified Chinese
- Slovak
- Slovenian
- Somali
- Spanish
- Swedish
- Tagalog (Filipino)
- Tamil
- Telugu
- Thai
- Traditional Chinese
- Turkish
- Ukrainian
- Urdu
- Uzbek
- Vietnamese
- post_type (nullable)
- QUOTE_TWEET
- RETWEET
- region (nullable)
- Afghanistan
- Austria
- Azerbaijan
- Belarus
- Canada
- Czech Republic
- Denmark
- Egypt
- Estonia
- Finland
- France
- Germany
- Greece
- Hong Kong
- India
- "Iran, Islamic Republic of"
- Iraq
- Israel
- Italy
- Japan
- Latvia
- Malaysia
- Mexico
- Russian Federation
- Samoa
- Saudi Arabia
- Serbia
- Spain
- Sweden
- Switzerland
- Turkey
- Ukraine
- United Arab Emirates
- United Kingdom
- United States
- Unknown
Schema can be found in my project, https://github.com/EvanCarroll/russian-troll-tweets/blob/master/PostgreSQL/create.psql
New Version 2.0 Schema for PostgreSQL. We now have Primary Keys (unique twitter ids), and int8 account ids. https://github.com/EvanCarroll/russian-troll-tweets/blob/version_2/PostgreSQL/create.psql