Pillow Rename 16-bit RGB rawmodes

Fixes #8021. Supersedes #7965. This is based on #8026, but that pull request has been open for nearly two months now without any activity, and the next version of Pillow is coming soon. I'd like for this change to go out with the #7978 changes so that the deprecation timelines match.

RGB;15 → XBGR;1555 RGB;16 → BGR;565 BGR;5 → XRGB;1555 BGR;15 → XRGB;1555 BGR;16 → RGB;565 RGB;4B → XBGR;4 RGBA;4B → ABGR;4 RGBA;15 → ARGB;1555 BGRA;15 → ABGR;1555

Packers were added for XRGB;1555, RGB;565, XBGR;1555, and BGR;565. Previously it was possible to use the BGR;15 and BGR;16 modes for XRGB;1555 and RGB;565 data, but since those modes are being deprecated, these new packers are needed to keep the ability to write that data.

The unpackers were rewritten so that their names match the new rawmodes, and to reduce the usage of multiplication and division for performance reasons.

Jun 21 '24 16:06 Yay295

The test suite is printing a number of deprecation warnings here - https://github.com/python-pillow/Pillow/actions/runs/9616954928/job/26527638175?pr=8158#step:10:5144

For comparison, see main where deprecation warnings are not printed by the test suite - https://github.com/python-pillow/Pillow/actions/runs/9622876608/job/26544673674#step:10:4868

Jun 22 '24 07:06 radarhere

Those warnings should be gone now.

Jun 22 '24 14:06 Yay295

Packers were added for XRGB;1555, RGB;565, XBGR;1555, and BGR;565. Previously it was possible to use the BGR;15 and BGR;16 modes for XRGB;1555 and RGB;565 data, but since those modes are being deprecated, these new packers are needed to keep the ability to write that data.

If I understand correctly, you're saying that at the moment, you can pack BGR;15 to BGR;15 and BGR;16 to BGR;16, but because those modes are deprecated, you'd like to add the ability to pack from RGB and RGBX to those rawmodes?

The modes are being deprecated because they aren't used. I don't think there is any need to try and retain their functionality.

Jun 26 '24 08:06 radarhere

to reduce the usage of multiplication and division for performance reasons.

Could you explain this? I'm not seeing anywhere that you've rewritten the C code of how a mode is packed or unpacked, so I don't understand why performance is affected.

Jun 26 '24 08:06 radarhere

The following code doesn't raise a deprecation warning.

>>> from PIL import Image
>>> im = Image.new("P", (1, 1))
>>> im.putpalette(b"0", "BGR;15")

Jun 26 '24 08:06 radarhere

void
ImagingUnpackRGB4B(UINT8 *out, const UINT8 *in, int pixels) {
    int i, pixel;
    /* RGB, 4 bits per pixel, little-endian */
    for (i = 0; i < pixels; i++) {
        pixel = in[0] + (in[1] << 8);
        out[R] = (pixel & 15) * 17;
        out[G] = ((pixel >> 4) & 15) * 17;
        out[B] = ((pixel >> 8) & 15) * 17;
        out[A] = 255;
        out += 4;
        in += 2;
    }
}
static void
ImagingUnpackXBGR4(UINT8 *out, const UINT8 *in, const int pixels) {
    /* XBGR, 4 bits per pixel, little-endian */
    for (int i = 0; i < pixels; i++) {
        const UINT8 b = in[1] & 0x0F;
        const UINT8 g = in[0] & 0xF0;
        const UINT8 r = in[0] & 0x0F;
        out[R] = (r << 4) | r;
        out[G] = g | (g >> 4);
        out[B] = (b << 4) | b;
        out[A] = 255;
        out += 4;
        in += 2;
    }
}

The existing function multiplies each value by 17 (17 == 255 // 15) to scale from 4 to 8 bits, while the new function uses a shift and an or. The math looks like this in Python:

bytes4 = list(range(2**4))
bytes4to8accurate = [round(x * 255 / 15) for x in bytes4]
bytes4to8current = [x * 255 // 15 for x in bytes4]
bytes4to8new = [x << 4 | x for x in bytes4]

Jun 26 '24 14:06 Yay295

to reduce the usage of multiplication and division for performance reasons.

The existing function multiplies each value by 17 (17 == 255 // 15) to scale from 4 to 8 bits, while the new function uses a shift and an or.

Oh, I had expected that you'd rewritten the existing functions.

Taking a look,

from PIL import Image
rawmodes = {
  "RGB;15": "XBGR;1555",
  "RGB;16": "BGR;565",
  "BGR;5": "XRGB;1555",
  "BGR;15": "XRGB;1555",
  "BGR;16": "RGB;565",
  "RGB;4B": "XBGR;4",
  "RGBA;4B": "ABGR;4",
  "RGBA;15": "ABGR;1555",
  "BGRA;15": "ARGB;1555",
  "BGRA;15Z": "ARGB;1555Z",
}
for old, new in rawmodes.items():
    mode = "RGBA" if "A" in old else "RGB"
    data = bytes()
    for i in range(65536):
        data += (i).to_bytes(2, 'big')
    im_old = Image.frombytes(mode, (65536, 1), data, "raw", old, 0, 1)
    im_new = Image.frombytes(mode, (65536, 1), data, "raw", new, 0, 1)
    if im_old.tobytes() != im_new.tobytes():
        print("Differences found:", old, new)
    else:
        print("Identical:", old, new)

I see

Differences found: RGB;15 XBGR;1555
Differences found: RGB;16 BGR;565
Differences found: BGR;5 XRGB;1555
Differences found: BGR;15 XRGB;1555
Differences found: BGR;16 RGB;565
Identical: RGB;4B XBGR;4
Identical: RGBA;4B ABGR;4
Differences found: RGBA;15 ABGR;1555
Differences found: BGRA;15 ARGB;1555
Differences found: BGRA;15Z ARGB;1555Z

The ones that are identical should be able to immediately replace the existing functions - https://github.com/Yay295/Pillow/pull/24

However, as for the replacements that produce different output, do you think the new output is more or less accurate?

Jun 29 '24 04:06 radarhere

I did mention it in the description of one of the commits, but I also have some Python to show the difference. For going from 5 bits to 8 bits:

bytes5 = list(range(2**5))
bytes5to8accurate = [round(x * 255 / 31) for x in bytes5]
bytes5to8current = [x * 255 // 31 for x in bytes5]
bytes5to8fast = [x << 3 | x >> 2 for x in bytes5]

The current method differs from the accurate rounding for 15 numbers, while the new method differs for only 4 numbers. For each number the difference is only 1.

For going from 6 to 8 bits the current method differs from the accurate rounding for 30 numbers, while the new method differs for only 10 numbers. Again for each number the difference is only 1.

bytes6 = list(range(2**6))
bytes6to8accurate = [round(x * 255 / 63) for x in bytes6]
bytes6to8current = [x * 255 // 63 for x in bytes6]
bytes6to8fast = [x << 2 | x >> 4 for x in bytes6]

Another benefit is that the new methods are roundtrippable. Currently going from 5 to 8 to 5 bits (or 6 to 8 to 6) will not produce the original values for most numbers.

Jun 29 '24 05:06 Yay295

Could you mention in the documentation that the output will be different?

Jun 29 '24 05:06 radarhere

I could have just committed that instead of rebasing, but I've been doing so much rebasing today I didn't think of it...

Jun 29 '24 05:06 Yay295