How to maintain an ongoing archive?
This question is related to https://github.com/timhutton/twitter-archive-parser/issues/147 which received a long answer, but after reading it unfortunately I'm still not sure...
My use for your tool is not a "goodbye!" archive, but I'm still using Twitter and will probably continue to do so for a while. I want an archive of my tweets because older tweets are hard to access through the standard interface, sometimes tweets I retweeted or quote tweeted are deleted by their poster, and maybe someday Twitter just disappears suddenly without even the ability to request a last archive. (Who knows – just recently they announced they'll monetize API access.)
Because of my ongoing activity, I'd like to maintain an ongoing archive, i.e. periodically add new tweets but keep those that are already archived (even if they were deleted in the meantime). Less important but also an aspect is that there's no need to download all the old stuff again if it already has been archived, and I'd prefer to avoid getting my IP address blacklisted by Twitter due to excessive requests.
Since the script is processing archives from the official Twitter archive function, there is no way around periodically requesting and downloading new archives. My idea would be that I can unzip a new archive into the old folder, run parser.py again, and then only new stuff is added to parser-output.
My impression from the issue cited above is that at the moment, twitter-archive-parser is not made for that. Things may be getting there, but currently there is no guarantee. That would mean that for the moment I should unzip each new archive into its own folder, and run parser.py on that – and maybe later figure out a way to merge a series of such folders.
Is that correct?
And are there plans to support what I have in mind at some point?
Btw. thanks for your already brilliant tool!
FWIW I'm doing
cd twitter-archive/twitter-archive-site && python3 parser.py
docker run --rm -v /transition/twitter-archive/twitter-archive-site/output:/src klakegg/hugo
... but my last update didn't bring new content so I'm wondering if that's due to ~~Twitter removing free API usage and thus how often I might use this anymore.~~ me forgetting that this step has to be manually done indeed, . Just did so today and refreshed with that script without issue. IMHO the "hard" part is Twitter preventing automation but curious to learn how others are doing it. I might just do a weekly quasi manual update.
@Utopiah I don't understand. The first line is simply how the Twitter archive parser is called. The second line seems to build a website out of the result? How does this help with my issue?