twitter-archive-parser icon indicating copy to clipboard operation
twitter-archive-parser copied to clipboard

Keep download state and do not attempt to redownload images and user handles

Open ixs opened this issue 3 years ago • 5 comments
trafficstars

Add a new json file (media/media_state.json) that keeps track of download status for media files.

When re-running the archive-parser this file is consulted and if a media file has previously been downloaded, skip any attempt to check file size etc.

This is great for testing code changes etc. because we're not hammering the twitter servers anymore. No need to hasten their demise.

Added: Persistent state file for media downloads. Added: Skip downloads if media state indicates, we previously had a successful download. Drive-By: Capitalize the N choice in prompts (y/N) indicating the default choice.

ixs avatar Nov 19 '22 10:11 ixs

Hi @ixs. Sorry, I've completely ignored this PR. Thanks for sending it.

timhutton avatar Nov 23 '22 19:11 timhutton

No worries. Reworking this right now to also keep the user data... New PR coming in a few mins.

ixs avatar Nov 23 '22 19:11 ixs

Reworked the download state cache a bit to also work for user lookups. I'm not super happy with the way we're now passing down the state dictionary down three functions to the actual get_twitter_users() function but the alternative would be a global variable which is also not that nice...

Maybe should rework everything into a class and then have a class level state... 🤣

I'd appreciate a look at the get_twitter_users() logic, I believe I am trimming correctly and caching correctly but there are a bunch of accounts that parser.py is trying to download over and over again. It looks like these accounts are deleted accounts that don't exist on the platform anymore, but I'd like to have a second pair of eyes on that.

ixs avatar Nov 23 '22 20:11 ixs

@ixs Passing a value down through several layers of functions is completely fine. Large classes are bad because they just become a repository of almost-globals. Even small classes with both data and member functions are often bad because they're stateful. Classes with just data or just functions are fine.

timhutton avatar Nov 24 '22 23:11 timhutton

There are some merge conflicts now; try rebasing?

cooljeanius avatar Jun 17 '24 10:06 cooljeanius