twitter-archive-parser
twitter-archive-parser copied to clipboard
Replace image URLs in DMs with links to local files
Mostly adapted from the code that does the same thing in the convert_tweet method.
Still missing the part that saves the online paths for later lookup of better quality version, because I don't know how the best version online path would look for an image from a DM, it might be very different from the ones in a tweet (assuming this because the structure of the archive's JSON is also different for DMs and tweets).
I also wrote hints for this as TODO comments in the code. And I'm neither sure if DMs can even contain videos anyway, nor if I understood the video URL parsing part correctly... please review and test this. For me, it runs without errors - but this still shouldn't be merged in the current unfinished state.
Help with the missing part would be much appreciated :)
I figured out the URL format for the original size images (still no idea about the videos). But they can't be retrieved (at least from the browser) without the correct cookie (which probably means being logged in as the archive's owner). Until that problem is solved, I think the URL should not even be added to the list of possible best-quality media downloads.
So I think this could be merged now (that is, if you find no need for changes unrelated to the best-quality media url part), because I think the cookie problem would get its own issue and PR anyway if it's even possible at all...
Is there duplicated code here, between parse_direct_messages() and convert_tweet()? I wonder if first we need to refactor convert_tweet()
into several smaller functions such that parse_direct_messages()
can reuse them.
Is there duplicated code here, between parse_direct_messages() and convert_tweet()? I wonder if first we need to refactor
convert_tweet()
into several smaller functions such thatparse_direct_messages()
can reuse them.
Currently, there is a little bit of duplicated code, but the JSON format of Tweets and DMs is so different that I think it would take a lot of abstraction (and thus, making the code even less readable than it already is) to make any extracted method usable in both cases. So I'm not sure if that would even be worth it.
This change doesn't embed images, sadly, since on L596 the body is inside ```...```
A rendering of the markdown looks like this:
We could consider remove the backquotes. It would break some DMs that have code in them, which appear like this:
"messageCreate" : {
"recipientId" : "15840592",
"reactions" : [ ],
"urls" : [ ],
"text" : "kernel void rd_compute(global float *a_in,global float *a_out)\n{\n // parameters:\n const float time ...
We could replace \n
with <space><space>\n
, which makes a newline in markdown. Code blocks won't align any more (because the rendering won't be in a fixed-width font any more but I think that's the best we can do.
But let's get this change in and we can fix that issue later if it helps unblock things.
It makes sense that images in DMs require the user's login to retrieve. We aren't going to fix that any time soon. I'd be tempted to remove the download URL and the comment about it.
This change doesn't embed images, sadly, since on L596 the body is inside
...
A rendering of the markdown looks like this:
We could consider remove the backquotes. It would break some DMs that have code in them, which appear like this:
"messageCreate" : { "recipientId" : "15840592", "reactions" : [ ], "urls" : [ ], "text" : "kernel void rd_compute(global float *a_in,global float *a_out)\n{\n // parameters:\n const float time ...
We could replace
\n
with<space><space>\n
, which makes a newline in markdown. Code blocks won't align any more (because the rendering won't be in a fixed-width font any more but I think that's the best we can do.
Agreed :)
I did something like this (the newlines in markdown thing) for Tweets in #123 and I think the look of the DMs text body output could be changed to the same format at the Tweets. (I was actually waiting for this here to be merged so I could apply that to DMs anyway, as I said in #123)
I think the fixed width is not more necessary for DMs than it would be for Tweets, and the Tweets md output looks fine without it :) ... on Twitter's own frontends, code snippets would also not be displayed with correct indents and a monospaced font, so I think people are used to that anyway :)