twitter-archive-parser
twitter-archive-parser copied to clipboard
Support for command line arguments
Built in support for some (optional) command line arguments: reading from a different folder than the one it's writing to, automatically saying "yes" or "no" to downloading user data and/or better images, and an offline mode.
--archive-folder ARCHIVE_FOLDER
path to the twitter archive folder
--get-users {yes,no,default}
always/never download missing user info from Twitter. Default behaviour is asking each time.
--better-images {yes,no,default}
always/never download best quality version of images from Twitter. Default behaviour is asking at runtime.
--offline offline mode: only convert local archive files, don't try to download anything from Twitter
More arguments can be added easily. For example it might be a useful option to only run a specific part of the parsing, if one wants to update parts of an already-parsed archive.
What problem is this solving? None of our users should need to run the script more than once.
What problem is this solving? None of our users should need to run the script more than once.
I've run it many times already... At least once for every time I learned that a new feature was added.
I think it would mostly be useful for automation, e.g. if you have archives of many different twitter accounts with a lot of data and want to let the parsing run while you're doing something else: You could make a shell command to parse all your archives in a row. Which might take a few hours with all the image downloads. And it would be useful if you don't have to sit next to it waiting for the input prompts.
I've run it many times already... At least once for every time I learned that a new feature was added.
Me too. But we're not the users
I think it would mostly be useful for automation, e.g. if you have archives of many different twitter accounts with a lot of data and want to let the parsing run while you're doing something else: You could make a shell command to parse all your archives in a row. Which might take a few hours with all the image downloads. And it would be useful if you don't have to sit next to it waiting for the input prompts.
In theory, yes. But I haven't seen anyone doing this or asking for it.
In general, I suggest getting consensus on a feature first, in the issues.
We had a (German) conversation on Mastodon today with some (potential) users. While this tool was recommended by multiple other participants, there was some uncertainty on whether it is ready for usage right now, or if they should wait for some more bugs to be fixed or features to be added.
The consensus was: run the tool as early as possible, and re-run it whenever significant features have been added, because Twitter (or relevant parts of it) might go down anytime soon.
Of course, no one knows how many days or weeks it might take until either Twitter fails, or this tool becomes feature-complete, but I'd argue that re-running it every now and then might become a routine for many users who care about their archive data. And answering the same questions over and over becomes annoying pretty quickly.
Interesting, thanks for the perspectives.
We had a (German) conversation on Mastodon today with some (potential) users. While this tool was recommended by multiple other participants, there was some uncertainty on whether it is ready for usage right now, or if they should wait for some more bugs to be fixed or features to be added.
I'm wondering where that uncertainty is coming from. Maybe we should remove the TODO section from the front page? Or communicate that there's no problem running it more than once? That we do no damage to the archive?
The consensus was: run the tool as early as possible, and re-run it whenever significant features have been added, because Twitter (or relevant parts of it) might go down anytime soon.
This is definitely the right advice I think.
Of course, no one knows how many days or weeks it might take until either Twitter fails, or this tool becomes feature-complete, but I'd argue that re-running it every now and then might become a routine for many users who care about their archive data. And answering the same questions over and over becomes annoying pretty quickly.
I can see value in what you're saying. But a few things are holding me back:
- Putting extended instructions on the front page makes the tool look harder to use. It's more stuff to read, even if you decide that you can ignore it all.
- The change adds ~50 lines of code to a script that is 600 lines.
- It's an extra source of confusion when the code changes: running with some options might no longer be valid, others might no longer be the choice you wanted. When websites publish blogs about how to use the tool (there's at least two out there already, plus many more social media posts) the options they recommend will be out of date.
- It adds maintenance overhead - we will need to consider how the options should change, how they will now interact with each other, how to communicate the changes.
- It adds inertia - the worry about confusion and maintenance will reduce our appetite for change.
How about this instead:
- We keep a single enum/int in the script for [
never_download,always_download,always_ask(default) ] - We print out at the end that there are flags you can set in the code if you want to get rid of the questions.
- Obviously the flags you set will get overwritten by a new version but I think that's reasonable because they may no longer be valid anyway. Or if we really think that's a burden, would maybe consider loading them from a config file if present, and document it in the script.
- At the same time we work to reduce the number of questions asked. For example: merge the download of handles for followings and DMs (leaving followers separate, unless small in number).
(closing this after I copied the relevant info to #111 )