warcbase
                                
                                 warcbase copied to clipboard
                                
                                    warcbase copied to clipboard
                            
                            
                            
                        New Twitter Features: Few Suggestions, Request for Further Suggestions
Right now we've got URL extraction, language extraction, hashtag extraction, and image extraction. We should have a few more features documented. I think this could begin with:
- plain text extraction -> one line per tweet text;
- user extraction (i.e. top ten users in a corpus);
- retweeted tweet tracking (i.e. top ten retweeted tweets);
There may be other requests, so please let us know in this issue.
Might as well list everything we can do with twarc utils. Twarc utils that I heavily use:
- deduplicate
- embeds (embedded media in a tweet)
- filter_date
- geojson
- ids
- retweets
- tags (hashtags)
- unshorten
- users
- validate
jq queries;
- extract text