go-toolkit Extracting additional information from images

We'd like to extract the following information from every bitmap image present in an EPUB package:

[x] height
[x] width
[x] size of the file
[x] presence of animated content
[x] cryptographic hash
[x] perceptual hash

This will be based on https://pkg.go.dev/golang.org/x/image plus additional packages for hashing.

All of these info will be added to the shared model for the Link Object and also included in the JSON/RWPM output:

Link Objects can already document a height and a width
We might consider adding the size of a file as a core property of the Link Object (size?)
While the presence of animated content and crypto/perspectual hashes should rely on properties and extensions

Jan 04 '25 13:01 HadrienGardeur

@HadrienGardeur If an image is animated, do we want the perceptual hash to be of the first frame? Or should we not try to create a perceptual hash in that case?

Feb 22 '25 04:02 chocolatkey

A test from the code I'm working on:

{
    "href": "test.png",
    "properties": {
        "animated": false,
        "hashes": {
            "blake2b": "7N3QCMPeq4S0VtYExLXQJ7v/aqQoyKKwnu5j8V7yxpk=",
            "md5": "5weR/lhjwhVjOHCa26neqQ==",
            "perceptual": "afx3f4x7oga8",
            "sha1": "9qfkrQQBtTPHUXGD4Elx08smZ1Q=",
            "sha256": "XvlAP3fAZAH5xvc1h9wXuyRaYYI7tV5Mhl4kmOjvxes=",
            "xxh3": "YEu5PnFqwiw="
        },
        "modified": 1740211344,
        "size": 1545264
    },
    "height": 1634,
    "width": 2496
}

Feb 22 '25 08:02 chocolatkey

How about adding a BlurHash as well? https://blurha.sh/

This could be useful in different places:

before the images are fetched, for example if the user scrolls quickly through the book
to display a thumbnail slider, when we don't have thumbnails yet

Feb 22 '25 09:02 mickael-menu

On the last call that we had together with @chocolatkey we discussed the idea of an object model for hashes. We were discussing about strings vs URIs to manage this vocabulary of algorithms, but we could default to a registry of strings with URIs for extensions.

If I repurpose the example shared by @chocolatkey (you'll note that size shouldn't be in properties, it's supported by the Link Object now):

{
  "href": "test.png",
  "type": "image/png"
  "size": 1545264
  "height": 1634,
  "width": 2496
  "properties": {
    "animated": false,
    "hash": [
      {
        "algorithm": "blake2b",
        "value": "7N3QCMPeq4S0VtYExLXQJ7v/aqQoyKKwnu5j8V7yxpk="
      },
      {
        "algorithm": "https://blurha.sh/",
        "value": "abc123"
      }
    ]
  }
}

It's not nearly as compact, but it would be more consistent with what we do for encrypted (and soon for archive).

Feb 22 '25 12:02 HadrienGardeur

@HadrienGardeur If an image is animated, do we want the perceptual hash to be of the first frame? Or should we not try to create a perceptual hash in that case?

@chocolatkey let's use the first frame in that case.

Feb 23 '25 13:02 HadrienGardeur

Updated example:

{
    "height": 345,
    "href": "5928969.png",
    "properties": {
        "animated": false,
        "hash": [{
            "algorithm": "blake2b",
            "value": "yT0VUQ5WyszBNuUFF8O4iVCEgI3eCZNNi0BneL76ZIM="
        }, {
            "algorithm": "sha256",
            "value": "L7tBWvLu2y8UhfiT9crrIQXupnsdqQAKc99HbNWd5/Q="
        }, {
            "algorithm": "sha1",
            "value": "5jd5/6sbnqoVSGeRL4V2DWsnFgQ="
        }, {
            "algorithm": "md5",
            "value": "cMmjWnvJRgsYisRqpvsqxA=="
        }, {
            "algorithm": "xxh3",
            "value": "NWgJ+VzcdCY="
        }, {
            "algorithm": "phash-dct",
            "value": "TXm1qEgim5c="
        }, {
            "algorithm": "https://blurha.sh",
            "value": "eoOLKA?^X-.8tkq_t7pHRjR*XntQkVnjnOyCo|i{RjenEfV@r@WooL"
        }]
    },
    "size": 42367,
    "width": 345
}

Feb 24 '25 01:02 chocolatkey

Looking good syntax-wise.

That's a lot of different hashes though, I think we'll need to decide on a good default. sha1 and md5 are legacy algorithms at this point, what should we default to for the cryptographic one?

Feb 24 '25 09:02 HadrienGardeur

Despite my yearning for more modern algorithms, realistically the default should be SHA-256. It's fast and has build-in support in every major programming language. I do think we should support at least md5 as well though, to accommodate for any legacy restrictions as well as the use of this algorithm in file transfers and storage such as S3.

Feb 24 '25 23:02 chocolatkey

I've pushed the WIP image analyzer (which includes a command-line tool for testing) to a new branch: https://github.com/readium/go-toolkit/tree/image-inference/cmd/analyzer https://github.com/readium/go-toolkit/blob/image-inference/pkg/analyzer/image.go

What doesn't work:

JXL file format support (no surprise)
Visual hashing & detection of animation for AVIF. Width/height works
Visual hashing of animated WEBP. Non-animated is fine

Aside from AVIF animation, any unsupported format/operation will return an error. The analysis function also has a flag to toggle the visual hashing, since it incurs a larger performance penalty and has slightly less support.

Feb 26 '25 03:02 chocolatkey

I've pushed the WIP image analyzer (which includes a command-line tool for testing) to a new branch: https://github.com/readium/go-toolkit/tree/image-inference/cmd/analyzer https://github.com/readium/go-toolkit/blob/image-inference/pkg/analyzer/image.go

I'm currently traveling so I'm not on a computer with the necessary environment for building Go. Is there a build available somewhere that I could use on an M1 MacBook Air?

JXL file format support (no surprise)

Visual hashing & detection of animation for AVIF. Width/height works

Visual hashing of animated WEBP. Non-animated is fine

These are reasonable compromises and unlikely to affect our workflow. We can always file issues for these three to keep an eye on them and eventually solve these issues later on.

Feb 27 '25 10:02 HadrienGardeur

@HadrienGardeur The code to enhance a link to an image with the properties you listed in the OP is finished, but your original request also involves a component that loops through all the assets in an EPUB to perform this link enhancement. Are you thinking of a flag for rwp manifest? And if so, what should it be called/how should it function?

Apr 17 '25 07:04 chocolatkey

I think that this should be behind a new flag where an optional list of tokens can be provided to specify how images should be hashed.

Apr 18 '25 07:04 HadrienGardeur