russian-troll-tweets icon indicating copy to clipboard operation
russian-troll-tweets copied to clipboard

Desperately Seeking Schema

Open jpallas opened this issue 6 years ago • 3 comments

It would be nice if the table in the README could be updated with information about the type of each field. In particular, for those fields that are enumerated constants (such as post_type and account_type), list the set of valid values and for all fields indicate whether they are nullable. Since the data format is not raw Twitter data, maybe a link to https://help.salesforce.com/articleView?id=mc_ss_csv_report_headers.htm&type=5 would be helpful, too.

jpallas avatar Aug 10 '18 18:08 jpallas

I’m not 100% certain about which fields may be nullable. But here’s a list of enumerated constant field values, for what it’s worth:

  • account_category
    • Commercial
    • Fearmonger
    • HashtagGamer
    • LeftTroll
    • NewsFeed
    • NonEnglish
    • RightTroll
    • Unknown
  • account_type (nullable)
    • ?
    • Arabic
    • Commercial
    • Ebola  (contains a trailing space)
    • French
    • German
    • Hashtager
    • Italian
    • Koch
    • left
    • local
    • news
    • Portuguese
    • right
    • Right
    • Russian
    • Spanish
    • Ukranian
    • Uzbek
    • ZAPOROSHIA
  • language (possibly nullable?)
    • Albanian
    • Arabic
    • Bengali
    • Bulgarian
    • Catalan
    • Croatian
    • Czech
    • Danish
    • Dutch
    • English
    • Estonian
    • Farsi (Persian)
    • Finnish
    • French
    • German
    • Greek
    • Gujarati
    • Hebrew
    • Hindi
    • Hungarian
    • Icelandic
    • Indonesian
    • Italian
    • Japanese
    • Kannada
    • Korean
    • Kurdish
    • LANGUAGE UNDEFINED
    • Latvian
    • Lithuanian
    • Macedonian
    • Malay
    • Malayalam
    • Norwegian
    • Polish
    • Portuguese
    • Pushto
    • Romanian
    • Russian
    • Serbian
    • Simplified Chinese
    • Slovak
    • Slovenian
    • Somali
    • Spanish
    • Swedish
    • Tagalog (Filipino)
    • Tamil
    • Telugu
    • Thai
    • Traditional Chinese
    • Turkish
    • Ukrainian
    • Urdu
    • Uzbek
    • Vietnamese
  • post_type (nullable)
    • QUOTE_TWEET
    • RETWEET
  • region (nullable)
    • Afghanistan
    • Austria
    • Azerbaijan
    • Belarus
    • Canada
    • Czech Republic
    • Denmark
    • Egypt
    • Estonia
    • Finland
    • France
    • Germany
    • Greece
    • Hong Kong
    • India
    • "Iran, Islamic Republic of"
    • Iraq
    • Israel
    • Italy
    • Japan
    • Latvia
    • Malaysia
    • Mexico
    • Russian Federation
    • Samoa
    • Saudi Arabia
    • Serbia
    • Spain
    • Sweden
    • Switzerland
    • Turkey
    • Ukraine
    • United Arab Emirates
    • United Kingdom
    • United States
    • Unknown

bet4a avatar Aug 10 '18 21:08 bet4a

Schema can be found in my project, https://github.com/EvanCarroll/russian-troll-tweets/blob/master/PostgreSQL/create.psql

EvanCarroll avatar Aug 25 '18 16:08 EvanCarroll

New Version 2.0 Schema for PostgreSQL. We now have Primary Keys (unique twitter ids), and int8 account ids. https://github.com/EvanCarroll/russian-troll-tweets/blob/version_2/PostgreSQL/create.psql

EvanCarroll avatar Aug 27 '18 22:08 EvanCarroll