twitter-archive-parser icon indicating copy to clipboard operation
twitter-archive-parser copied to clipboard

download_better_images.py should always download :orig versions of PNG, currently keeps lower-quality reencoded versions from archive zip

Open Bananattack opened this issue 3 years ago • 1 comments
trafficstars

The current check used by download_better_images.py to detect whether the file downloaded is "better quality" fails for lossless codecs like PNG. In a lossless compression format, a lower filesize doesn't imply a loss of quality, it's re-encoding the bit patterns of the raw original image into a more compact encoding scheme that represents the same output.

I recommend editing this script to unconditionally download :orig versions of PNG, as it makes a big difference. In particular, there's anti-aliasing introduced around screens/pixel art after Twitter re-encodes the original PNG present in the archive versions of media. As a side-effect the archive version degrade the quality a lot compared to the original upload for PNGs. By fixing the download check for PNGs, the quality issue is resolved.

Thanks for making this tool!

(For a quick workaround, I did this locally -- In all cases where file sizes smaller, the quality of the PNG is noticeably better in the :orig than the archive zip version.

  1. copy this script,
  2. a line like media_filenames = [filename for filename in media_filenames if os.path.splitext(filename)[1] == '.png'] between the glob + number_of_files lines to only fetch the PNGs
  3. replace if size_after > size_before: with if True: to unconditionally take the downloaded :orig over the existing png.

of course, a real fix will be more robust than that! Just sharing if those script modifications help anyone else in the meantime who encounters a similar PNG quality thing)

Bananattack avatar Nov 14 '22 22:11 Bananattack

@Bananattack It's strange but I don't see this with any PNG in my archive, which include screenshots. They're all the same size and visually identical to the ones available online with :orig. On a sample I've done an image difference and they are pixelwise identical.

Is this something that only affects old images perhaps? Or did you get your archive a long time ago (or very recently)? I got my archive in April 2022, and the tweets span 2013-2022.

timhutton avatar Nov 16 '22 02:11 timhutton

I too would love for this to default to downloading PNG vs JPG if available.

eisnerguy1 avatar Nov 18 '22 03:11 eisnerguy1