twayback icon indicating copy to clipboard operation
twayback copied to clipboard

Capture Statuses?

Open humandecoded opened this issue 2 years ago • 4 comments

Would there be any interest in modding the tool to also capture previous statuses? The capability is already in there, just need to add it in.

humandecoded avatar Feb 20 '22 17:02 humandecoded

Hey Tom! Hope you're having a good day.

Thanks for bringing this to my attention. As it stands, the script should download all snapshots, I think some people many not want that. So what I can do is add an option to download either all snapshots or only the latest one.

I'm almost done with the request, but I've been encountering problems. As you can see here, the only differences between the two scripts are two things: input statement asking user, and if/elif/else statement for removing duplicate Twitter links in case the user wants just one snapshot. The script works great for users choosing the all option. But for users choosing the one option, it doesn't work. I'm trying to figure out why, I'll get some help and see what's wrong. I'm pretty sure it's the dictionary thing at lines 125-127 that's generating the empty list, but why and how I can fix it, I don't know yet. Once I find a fix, I'll push it!

Thank you, I wish you had a very good weekend.

Mennaruuk avatar Feb 21 '22 00:02 Mennaruuk

Whoops. Just realized I didn't phrase that correctly as it relates to Twitter. I meant to say, "is there any interest in updating the script to capture profile changes that the wayback machine captures, as opposed to just individual tweets"

humandecoded avatar Feb 21 '22 00:02 humandecoded

Ah. My rudimentary understanding of what you would like is a feature focused on profile changes. So for example, how has Elon Musk’s profile changed from one date to another. That’s a fantastic idea! I think it can be built upon: perhaps it can track changes in names, bios, and profile picture/cover picture URLs (although sometimes Wayback fails to save them).

I will definitely push this feature as soon as I finish it. Thank you for your suggestion, I appreciate it!

Mennaruuk avatar Feb 21 '22 03:02 Mennaruuk

I've been tinkering with the script for the past two weeks, for the most part I think this is done, I pushed it to dev:

https://github.com/Mennaruuk/twayback/blob/dev/twayback.py

To use, append:

-b for bios -n for names -p for profile pic -h for header image

(You can combine them however you like.)

Two issues I've noticed:

  1. Twitter didn't include bios and header images as part of the Tweets' HTML until around 2016. This means for pre-2016 Tweets, bios and header images can't be fetched from Tweets' HTML. The only other way is to directly grab the profile's archived page (like twitter.com/biz) and extract bios/header images from there. So this means there has to be a whole new list of archived profile pages to fetch and extract from, as well as find a way to accommodate variations (what if profile was never archived? what if profile was archived like 10,000 times? if end user wants something like four bios per years, should a bio be obtained from links spaced out every three months or get all four bios from a very small period of time?) Lots of questions to answer.

  2. Screenshots don't really work for Tweets before the latest Twitter redesign around 2016. So pre-2016 Tweets can't be screenshot. I've been trying to no avail to make it work. Python says that it "Cannot take screenshot with 0 height." So something's wrong with Selenium locating the Tweet. I'll keep trying to fix it.

Mennaruuk avatar Mar 03 '22 18:03 Mennaruuk