forget icon indicating copy to clipboard operation
forget copied to clipboard

support full twitter archive format

Open codl opened this issue 6 years ago • 9 comments

Archive imports have been disabled because Twitter have disabled the kind of archive that Forget knew about.

The new archive format is much larger, because it includes media, and it is a privacy nightmare because it includes everything twitter knows about a user. DMs, ad targeting info, the whole mess. The old strategy of uploading the zip file and letting the server figure it out is not going to work.

The plan is to extract and parse archives in the browser, and send batches of statuses to the server.

Individual issues for tasks to be done as part of this project are #458, #459

Original issue text follows:


reported by [email protected] https://cybre.space/@rrix/100591445673683791

hey, it looks like the Twitter archive format changed at some point that makes it not work with Forget. there's no longer a data/js/tweets dir with monthly files, just a big jsonp file (67mib in my case) in the root of the zip

codl avatar Aug 22 '18 08:08 codl

?? i just requested a twitter archive and it still has /data/js/tweets. no big jsonp file at the root either. maybe this is some kinda A/B test?

@rrix can you provide an example archive? or more details about the format of the file & its filename?

codl avatar Aug 22 '18 17:08 codl

Hey sorry about the delay, I finally found a browser session that was logged in to Github.

The file is a "little" large, I'll upload a version of it with media files dropped out of it to my nextcloud and DM you a link to it through the fediverse.

rrix avatar Aug 28 '18 23:08 rrix

ok i get it now. it's not a new format, it's a different archive. it's the full account archive you get from https://twitter.com/settings/your_twitter_data instead of the tweet archive you get from https://twitter.com/settings/account#tweet_export

it could be supported but it would require extracting the zip in the browser and parsing it there, cos we cant reasonably upload gigabytes of images and videos just to get a few thousand tweet IDs. I'm not going to do that. but if someone's up to the task I'd be happy to help and to merge it in

what i will do is document it, link to the right page, and pop a warning before uploading if the archive is more than, say, 25 MB. mine reaches 6.3 MB with 30k ish tweets so i figure that's a safe value

codl avatar Oct 04 '18 21:10 codl

it sure did take me a whole month to figure that out huh 😅

codl avatar Oct 04 '18 21:10 codl

Oh wow, I had no idea there were two different things there. I'll upload the archive you're expecting, thanks for the pointer and investigating!

rrix avatar Oct 04 '18 23:10 rrix

the old "tweets archive" format has apparently been phased out. support for the new archive format is now essential. reopening

codl avatar Sep 13 '19 02:09 codl

so, not only is the new format huge and inconvenient to upload, it also has a lot of personal data that i would rather never came even close to my server

my current plan is:

  1. let users upload the tweet.js file from within that archive (like #119 suggested)
  2. if the user selects a full zip archive, unzip it in browser and only upload tweet.js

codl avatar Sep 13 '19 02:09 codl

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 12 '20 10:09 stale[bot]

Hi all. Sorry for taking so long. I intend to get this done by the end of the month.

codl avatar May 14 '21 12:05 codl