Laura Wrubel
Laura Wrubel
This was requested by a GW researcher who wanted to search on words in an exported spreadsheet, or otherwise read full retweets there.
Work with @Synkronicity to implement this.
Update: https://github.com/gwu-libraries/sfm-ui/wiki/Release-process
No data file is provided because sfmutils.warc_iter reports in the sfm_weiboexporter_1 container log: "Processed 0 records. Yielded 0 items" and "Bad json in record" for each WARC in the collection....
JSON is also fine when reviewed in the source warcs. The issue seems to be with warc_iter.py or the warcio library, but needs more research.
I've done some preliminary work on integrating @sebastian-nagel's harvester code side-by-side with existing Twitter v1 REST API harvests and wired some of the harvest types up to sfm-ui. See: *...
Currently on Ubuntu 18.
Use wording consistent with #681.
Discuss with @justinlittman whether to gzip all the files together or gzip individual files. Priority is full JSON export type.
For this ticket, gzip individual files, not the whole set of export files.