Extracting additional information from images
We'd like to extract the following information from every bitmap image present in an EPUB package:
- [x] height
- [x] width
- [x] size of the file
- [x] presence of animated content
- [x] cryptographic hash
- [x] perceptual hash
This will be based on https://pkg.go.dev/golang.org/x/image plus additional packages for hashing.
All of these info will be added to the shared model for the Link Object and also included in the JSON/RWPM output:
- Link Objects can already document a
heightand awidth - We might consider adding the size of a file as a core property of the Link Object (
size?) - While the presence of animated content and crypto/perspectual hashes should rely on
propertiesand extensions
@HadrienGardeur If an image is animated, do we want the perceptual hash to be of the first frame? Or should we not try to create a perceptual hash in that case?
A test from the code I'm working on:
{
"href": "test.png",
"properties": {
"animated": false,
"hashes": {
"blake2b": "7N3QCMPeq4S0VtYExLXQJ7v/aqQoyKKwnu5j8V7yxpk=",
"md5": "5weR/lhjwhVjOHCa26neqQ==",
"perceptual": "afx3f4x7oga8",
"sha1": "9qfkrQQBtTPHUXGD4Elx08smZ1Q=",
"sha256": "XvlAP3fAZAH5xvc1h9wXuyRaYYI7tV5Mhl4kmOjvxes=",
"xxh3": "YEu5PnFqwiw="
},
"modified": 1740211344,
"size": 1545264
},
"height": 1634,
"width": 2496
}
How about adding a BlurHash as well? https://blurha.sh/
This could be useful in different places:
- before the images are fetched, for example if the user scrolls quickly through the book
- to display a thumbnail slider, when we don't have thumbnails yet
On the last call that we had together with @chocolatkey we discussed the idea of an object model for hashes. We were discussing about strings vs URIs to manage this vocabulary of algorithms, but we could default to a registry of strings with URIs for extensions.
If I repurpose the example shared by @chocolatkey (you'll note that size shouldn't be in properties, it's supported by the Link Object now):
{
"href": "test.png",
"type": "image/png"
"size": 1545264
"height": 1634,
"width": 2496
"properties": {
"animated": false,
"hash": [
{
"algorithm": "blake2b",
"value": "7N3QCMPeq4S0VtYExLXQJ7v/aqQoyKKwnu5j8V7yxpk="
},
{
"algorithm": "https://blurha.sh/",
"value": "abc123"
}
]
}
}
It's not nearly as compact, but it would be more consistent with what we do for encrypted (and soon for archive).
@HadrienGardeur If an image is animated, do we want the perceptual hash to be of the first frame? Or should we not try to create a perceptual hash in that case?
@chocolatkey let's use the first frame in that case.
Updated example:
{
"height": 345,
"href": "5928969.png",
"properties": {
"animated": false,
"hash": [{
"algorithm": "blake2b",
"value": "yT0VUQ5WyszBNuUFF8O4iVCEgI3eCZNNi0BneL76ZIM="
}, {
"algorithm": "sha256",
"value": "L7tBWvLu2y8UhfiT9crrIQXupnsdqQAKc99HbNWd5/Q="
}, {
"algorithm": "sha1",
"value": "5jd5/6sbnqoVSGeRL4V2DWsnFgQ="
}, {
"algorithm": "md5",
"value": "cMmjWnvJRgsYisRqpvsqxA=="
}, {
"algorithm": "xxh3",
"value": "NWgJ+VzcdCY="
}, {
"algorithm": "phash-dct",
"value": "TXm1qEgim5c="
}, {
"algorithm": "https://blurha.sh",
"value": "eoOLKA?^X-.8tkq_t7pHRjR*XntQkVnjnOyCo|i{RjenEfV@r@WooL"
}]
},
"size": 42367,
"width": 345
}
Looking good syntax-wise.
That's a lot of different hashes though, I think we'll need to decide on a good default. sha1 and md5 are legacy algorithms at this point, what should we default to for the cryptographic one?
Despite my yearning for more modern algorithms, realistically the default should be SHA-256. It's fast and has build-in support in every major programming language. I do think we should support at least md5 as well though, to accommodate for any legacy restrictions as well as the use of this algorithm in file transfers and storage such as S3.
I've pushed the WIP image analyzer (which includes a command-line tool for testing) to a new branch: https://github.com/readium/go-toolkit/tree/image-inference/cmd/analyzer https://github.com/readium/go-toolkit/blob/image-inference/pkg/analyzer/image.go
What doesn't work:
- JXL file format support (no surprise)
- Visual hashing & detection of animation for AVIF. Width/height works
- Visual hashing of animated WEBP. Non-animated is fine
Aside from AVIF animation, any unsupported format/operation will return an error. The analysis function also has a flag to toggle the visual hashing, since it incurs a larger performance penalty and has slightly less support.
I've pushed the WIP image analyzer (which includes a command-line tool for testing) to a new branch: https://github.com/readium/go-toolkit/tree/image-inference/cmd/analyzer https://github.com/readium/go-toolkit/blob/image-inference/pkg/analyzer/image.go
I'm currently traveling so I'm not on a computer with the necessary environment for building Go. Is there a build available somewhere that I could use on an M1 MacBook Air?
- JXL file format support (no surprise)
- Visual hashing & detection of animation for AVIF. Width/height works
- Visual hashing of animated WEBP. Non-animated is fine
These are reasonable compromises and unlikely to affect our workflow. We can always file issues for these three to keep an eye on them and eventually solve these issues later on.
@HadrienGardeur The code to enhance a link to an image with the properties you listed in the OP is finished, but your original request also involves a component that loops through all the assets in an EPUB to perform this link enhancement. Are you thinking of a flag for rwp manifest? And if so, what should it be called/how should it function?
I think that this should be behind a new flag where an optional list of tokens can be provided to specify how images should be hashed.