warcbase icon indicating copy to clipboard operation
warcbase copied to clipboard

New Twitter Features: Few Suggestions, Request for Further Suggestions

Open ianmilligan1 opened this issue 9 years ago • 1 comments

Right now we've got URL extraction, language extraction, hashtag extraction, and image extraction. We should have a few more features documented. I think this could begin with:

  • plain text extraction -> one line per tweet text;
  • user extraction (i.e. top ten users in a corpus);
  • retweeted tweet tracking (i.e. top ten retweeted tweets);

There may be other requests, so please let us know in this issue.

ianmilligan1 avatar Apr 06 '16 17:04 ianmilligan1

Might as well list everything we can do with twarc utils. Twarc utils that I heavily use:

  • deduplicate
  • embeds (embedded media in a tweet)
  • filter_date
  • geojson
  • ids
  • retweets
  • tags (hashtags)
  • unshorten
  • users
  • validate

jq queries;

  • extract text

ruebot avatar Apr 06 '16 19:04 ruebot