twitter-archive-parser
twitter-archive-parser copied to clipboard
download_better_images.py should always download :orig versions of PNG, currently keeps lower-quality reencoded versions from archive zip
The current check used by download_better_images.py to detect whether the file downloaded is "better quality" fails for lossless codecs like PNG. In a lossless compression format, a lower filesize doesn't imply a loss of quality, it's re-encoding the bit patterns of the raw original image into a more compact encoding scheme that represents the same output.
I recommend editing this script to unconditionally download :orig versions of PNG, as it makes a big difference. In particular, there's anti-aliasing introduced around screens/pixel art after Twitter re-encodes the original PNG present in the archive versions of media. As a side-effect the archive version degrade the quality a lot compared to the original upload for PNGs. By fixing the download check for PNGs, the quality issue is resolved.
Thanks for making this tool!
(For a quick workaround, I did this locally -- In all cases where file sizes smaller, the quality of the PNG is noticeably better in the :orig than the archive zip version.
- copy this script,
- a line like
media_filenames = [filename for filename in media_filenames if os.path.splitext(filename)[1] == '.png']between the glob + number_of_files lines to only fetch the PNGs - replace
if size_after > size_before:withif True:to unconditionally take the downloaded:origover the existing png.
of course, a real fix will be more robust than that! Just sharing if those script modifications help anyone else in the meantime who encounters a similar PNG quality thing)
@Bananattack It's strange but I don't see this with any PNG in my archive, which include screenshots. They're all the same size and visually identical to the ones available online with :orig. On a sample I've done an image difference and they are pixelwise identical.
Is this something that only affects old images perhaps? Or did you get your archive a long time ago (or very recently)? I got my archive in April 2022, and the tweets span 2013-2022.
I too would love for this to default to downloading PNG vs JPG if available.