mozjpeg
mozjpeg copied to clipboard
Use low-pass edge extension to pad images
When encoding an image with dimensions that are not a multiple of 16, both mozjpeg and libjpeg-turbo use an edge extension that simply copies the pixels from the right and bottom edge into the padded area. Because this image data is not displayed we can use any edge extension we want. Both daala and theora use a low-pass filter to extend the frame and this is shown to have very modest compression gains, essentially for free.
Talked a bit with a DSP friend, and he informed me that you can compute a globally optimal linear filter for codecs with a fixed block size and non-lapped DCT, and no sample-domain prediction (i.e. JPEG).
What this means is we can guarantee at least N
zero DCT coefficients after applying FDCT if we prefilter using this method. i.e. good for compression. N depends on the number of pixel values vs padding.
In hindsight, it seems obvious. But hindsight is 20:20.
The prefiltering/edge extension algorithm would end up like:
Given the original edge 8x8 block (pre-padding):
x1 y1 z1 _ _ _ _ _
x2 y2 z2 _ _ _ _ _
x3 y3 z3 _ _ _ _ _
x4 y4 z4 _ _ _ _ _
x5 y5 z5 _ _ _ _ _
x6 y6 z6 _ _ _ _ _
x7 y7 z7 _ _ _ _ _
x8 y8 z8 _ _ _ _ _
Start on the first row:
x1 y1 z1 _ _ _ _ _
Let's label them:
x1 y1 z1 a1 b1 c1 d1 e1
a1
through e1
will be a linear combination of x
, y
, and z
.
So:
We want to solve: Fx = y
Where:
x = [x1 y1 x1]
y = [a1 b1 c1 d1 e1]
F = <select from the 7 possible cases>
F
here, will be precomputed for 7x1, 6x2, 5x3, and so on. i.e. 7 hardcoded table in the code total, which are derived based on the DCT we use:
T' = 8-point IDCT
We want:
T'[u v w 0 0 0 0 0] = [x y z a b c d e]
dice them up in 4, based on where the zeroes are,
so e.g. for the 5x3 case, T' is split into:
[T11 T12]
[T21 T22]
Where, sizes are:
3x3 3x5
5x3 5x5
So:
T'11[u;v;w] + T'21[0...] = [x;y;z] and
T'12[u;v;w] + T'22[0...] = [a;b;c;d;e]
Eliminating 0s, and knowing that T`11 is invertible (I was informed of this property):
[u;v;w] = inv(T'11)[x yz]
->
[a;b;c;d;e] = T'12[u;v;w] = T'12inv('T11)[x;y;z]
thus F = T'12inv(T'11)
... yeah so, I'll work these out later :tongue:.
Compute y
:
Fx = y
Which gets us:
x1 y1 z1 a1 b1 c1 d1 e1
x2 y2 z2 _ _ _ _ _
x3 y3 z3 _ _ _ _ _
x4 y4 z4 _ _ _ _ _
x5 y5 z5 _ _ _ _ _
x6 y6 z6 _ _ _ _ _
x7 y7 z7 _ _ _ _ _
x8 y8 z8 _ _ _ _ _
Compute for all rows to get:
x1 y1 z1 a1 b1 c1 d1 e1
x2 y2 z2 a2 b2 c2 d2 e2
x3 y3 z3 a3 b3 c3 d3 e3
x4 y4 z4 a4 b4 c4 d4 e4
x5 y5 z5 a5 b5 c5 d5 e5
x6 y6 z6 a6 b6 c6 d6 e6
x7 y7 z7 a7 b7 c7 d7 e7
x8 y8 z8 a8 b8 c8 d8 e8
Which is our final padded block.
For bottom/top, you'd do cols instead of rows.
I think I got this all right. Maybe @negge can comment?
EDIT: Clarity.
OK so I went through and tried what a listed above, and it had one issue: it was possible to generate dct coeffs larger than jpeg can handle.
However, Tim Terriberry linked me to a paper he wrote which can probably be applied instead, which is somewhat similar.
I will look into that.
Nathan, are you still planning to submit a patch for this?
@negge @dwbuiten Can one of you resolve this issue soon? This should be in mozjpeg 3.0, and we'd like to start preparing for a release.
I've been very busy with life/work. I'll revisit this early 2015.