booxtream-epub-drm-remover
booxtream-epub-drm-remover copied to clipboard
Script fails to remove covert image watermarking in newer epubs
While looking into #5, I tried downloading a free ebook from Verso under two different accounts and running the script on them, and then diffing the results using the diff.sh
script included in this repo to verify that the output was identical.
It turns out it isn't:
This is from a run of the most recent version of the script, where #5 is fixed. Ignoring the spurious error from calibre inserting a bookmark file into one of the epubs, the PNGs in the Images/
directory are not the same.
A quick binary diff of the images suggests significant chunks of them are changed, so it appears to be some kind of covert watermarking not handled by the script.
Ah, thanks for investigating further. The script already wipes exif/metadata from images, so it must be something new/different.
Any chance you could send me the two non-cleaned epubs so I could test as well? My email is in my profile.
The PNGs differ due to timestamps built into PNGs. The ImageMagick thingy saves the ctime and mtime to the PNGs:
$ diff -Naur <(identify -verbose fooub2/OEBPS/images/3e8f510f36f2f8f255dd.png) <(identify -verbose fooub3/OEBPS/images/3e8f510f36f2f8f255dd.png)
--- /dev/fd/63 2018-03-14 21:30:55.775635900 -0400
+++ /dev/fd/62 2018-03-14 21:30:55.780978800 -0400
@@ -1,4 +1,4 @@
-Image: fooub2/OEBPS/images/3e8f510f36f2f8f255dd.png
+Image: fooub3/OEBPS/images/3e8f510f36f2f8f255dd.png
Format: PNG (Portable Network Graphics)
Mime type: image/png
Class: DirectClass
@@ -456,8 +456,8 @@
Compression: Zip
Orientation: Undefined
Properties:
- date:create: 2018-03-14T21:07:13-04:00
- date:modify: 2018-03-14T21:07:04-04:00
+ date:create: 2018-03-14T21:06:19-04:00
+ date:modify: 2018-03-14T21:06:12-04:00
png:bKGD: chunk was found (see Background color, above)
png:cHRM: chunk was found (see Chromaticity, above)
png:IHDR.bit-depth-orig: 8
@@ -467,10 +467,10 @@
png:IHDR.interlace_method: 0 (Not interlaced)
png:IHDR.width,height: 82, 52
png:sRGB: intent=0 (Perceptual Intent)
- png:tIME: 2018-03-15T01:07:04Z
+ png:tIME: 2018-03-15T01:06:13Z
signature: 1b5af409341c6453c741c589eb7dc9ef4142db5ca9527db68ebabcb83e661981
Artifacts:
- filename: fooub2/OEBPS/images/3e8f510f36f2f8f255dd.png
+ filename: fooub3/OEBPS/images/3e8f510f36f2f8f255dd.png
verbose: true
Tainted: False
Filesize: 3.6KB
Like the zip timestamps after processing, those times correlate to file creation.
@Artoria2e5
This could have changed recently, or it could be an optional feature of BooXtream, but I bought two copies of the same book and there are definitely actual differences in the content of the images, not just metadata:
[~/book]$ diff -Naur <(identify -verbose cleaned/a/OEBPS/images/5316eab139b427a6a5ff.png) <(identify -verbose cleaned/b/OEBPS/images/5316eab139b427a6a5ff.png) | grep srgb | wc -l
70
[~/book]$ diff a.ppm b.ppm | wc -l
32
[~/book]$ diff a.ppm b.ppm | head -n 6
xxx0,xxx1cxxx0,xxx1
< 2x
< 2x
---
> 2y
> 2y
"a.ppm" and "b.ppm" are PPM files generated by GIMP from cleaned/a/OEBPS/images/5316eab139b427a6a5ff.png
and cleaned/b/OEBPS/images/5316eab139b427a6a5ff.png
. Some diff
output has been omitted for paranoia reasons.
Locating the affected pixels in the original, unedited PNG files and using the dropper tool in GIMP confirms that the images differ slightly.