twitter-archive-parser icon indicating copy to clipboard operation
twitter-archive-parser copied to clipboard

Feature request: Retrieve ALT-text for images

Open beadsland opened this issue 1 year ago • 9 comments

This data appears to be omitted from archive entirely.

Side note: Another Mastodoner has offered up an online tool to parse the .js file and grap down ALTs... but there are currently UI issues, however, that have thus far foiled attempts to use it.

beadsland avatar Nov 12 '22 08:11 beadsland

@beadsland I searched for ALT tags in my archive and I agree that they seem to be completely missing. Will keep this as a feature request.

timhutton avatar Nov 12 '22 14:11 timhutton

This link in the README worked for me: https://archive.alt-text.org/

cooljeanius avatar Nov 12 '22 23:11 cooljeanius

Yes, have been working with developer of alt-text.org on UI and error conditions there.

It now seems to be working reliably with extracted tweets.js or tweet-media.js generated by recommended script. So dropping the output from there someplace that parser.py can get to ought to be sufficient to incorporate into the output.md.

beadsland avatar Nov 13 '22 04:11 beadsland

Hi, I'm the author of https://archive.alt-text.org. Would you be willing to provide a way to fold the result of my tool in where possible? The output is a JSON file:

[
  {
     "tweet_id": "...",
     "media_key": "...",
     "media_url": "...",
      "alt_text": "..."
  },
  ...
]

hkolbeck avatar Nov 13 '22 05:11 hkolbeck

Hello! Yes, perfect.

timhutton avatar Nov 13 '22 09:11 timhutton

I just tested the Twitter Guest API implementation by @press-rouch and noticed that the result of get_tweet also contains the ALT-text for contained media at parse(likes.json)[tweet_id].extended_entities.media[index].ext_alt_text:

Bildschirmfoto 2022-11-19 um 14 36 35

lenaschimmel avatar Nov 19 '22 13:11 lenaschimmel

If at all possible, I think it would be useful to allow folding in the https://archive.alt-text.org result. I'm not sure how the Twitter Guest API works, but my reason for going with a site was to allow folks with no oauth or terminal knowledge to fetch their archives' alt text.

hkolbeck avatar Nov 20 '22 00:11 hkolbeck

The PR #97 is merged into the branch downloadtweets now and provides that basic functionality (which is still evolving) to re-download the tweets which contain media, and thus might contain alt-text. The downloaded tweet is not used in any way yet. See that PR for the overall idea on how to proceed.

@hkolbeck I like that your website provides an alternative for users without terminal knowledge, etc. Luckily, our tool does not need oauth or anything like that, since it operates without login or connection to the account. I'm not sure if / how you would like to integrate the result from your website though, since that obviously would require terminal knowledge from the user again.

lenaschimmel avatar Nov 22 '22 10:11 lenaschimmel

@lenaschimmel Interesting, I didn't realize you were sharing your bearer token. I had been assuming that this would require folks to know how to get auth keys for the Twitter API. If this can fetch them natively then there's no real reason to fold in my results. Thanks!

hkolbeck avatar Nov 22 '22 21:11 hkolbeck