cineform-sdk icon indicating copy to clipboard operation
cineform-sdk copied to clipboard

inconsistent rgb/rgba behavior

Open shekh opened this issue 6 years ago • 14 comments

I see different maximum values depending on requested decoding format.

decode to CFHD_PIXEL_FORMAT_BGRA: result: r/g/b/a = 0xFF as expected

decode to CFHD_PIXEL_FORMAT_R210: result: r/g/b = 0x3FC (expected 0x3FF)

decode to CFHD_PIXEL_FORMAT_B64A: sample: CFHD_ENCODED_FORMAT_RGB_4444 result: r/g/b = 0xFF00, a = 0xFFFF (expected r/g/b 0xFFFF) sample: CFHD_ENCODED_FORMAT_RGB_444 result: r/g/b = 0xFF00, a = 0xFFF0

Main issue is underscaled rgb, alpha seems ok where it is present.

shekh avatar Oct 31 '17 16:10 shekh

Internally all 444 formats are 12-bit, not 16-bit. So when 0xffff is input using CFHD_PIXEL_FORMAT_B64A, it is shifted to 0x0fff, encoded as 0x0fff, which is shifted back to 0xfff0 for the pixel container precision (not 0xff00 as you have suggested, I think that was a typo.) For alpha channel the codec will clip to 0xffff as a special case, but I'm not sure that is the correct behavior for RGB data. The issue is round-trip losses if 0x0fff is scaled to 0xffff, with a wavelet encode in between. Round-tripping quality is a big concern for codecs, but an optional scaled input and output pixel format would be straight forward to add, but the PSNR would drop slightly over round-trip encodes.

For CFHD_PIXEL_FORMAT_R210 I got 0x3FF for 0x3FF input (using a white frame.)

dnewman-gpsw avatar Oct 31 '17 21:10 dnewman-gpsw

Maybe it depends on how to encode sample? I used vfw codec feeding rgba32. There is some dithering/noise but after some averaging I get 0xFF00, not 0xFFF0. If I understand correctly the values are shifted without scaling both at encoding and decoding. So to interpret the decoded data I also need to know how many bits was fed into the encoder?

shekh avatar Oct 31 '17 21:10 shekh

If you used VFW you only feed 0xff into the codec. I'm not convinced that should be 0x3ff (10-bit) or 0xffff (16-bit) should be on the output. 0x3fc is 0xff shifted up by 2 bits. Seems to be working correctly. The ambiguity over the value of white when switching bit depths has been long known. After Effects used to use 15+1bit representation where white was 0x8000, maybe to help with this. As codec's performance is measured by its ability to match its input, I think CineForm is correct here. Currently it matches the input, but if you want 0xff00 to represent white (or know 8-bit input 0xff, should be 0xffff on 16-bit output) then metadata would be the solution. Fortunately CineForm has an existing metadata engine that could be used, we just need to define a suitable FourCC.

dnewman-gpsw avatar Oct 31 '17 22:10 dnewman-gpsw

It seems SAMPLE_HEADER.input_format can be used as a clue.

shekh avatar Nov 01 '17 01:11 shekh

Yes it might work, however I'm still not convinced that gaining up the image should be the automatic behavior for 8-bit source encodes -- particularly if it impacts performance or round-trip quality.

dnewman-gpsw avatar Nov 01 '17 04:11 dnewman-gpsw

It is not uncommon approach to scale rgb values when changing bitdepth. Some examples: Convert 8-bit image in Photoshop to 16-bit. 0xFF is scaled to 0xFFFF. r210 decoder in FFmpeg expands to 16-bit: *dst++ = r | (r >> 10); *dst++ = g | (g >> 10); *dst++ = b | (b >> 10); Up-down conversion with precise rounding is quite trivial and cannot be a performance issue. I don't claim it is necessary to change Cineform behavior. Scaling or not can be left to application. However if the 8-bit source was encoded without scaling, it is best to deal with it as 8-bit on decoding (in comparison, the codecs found in FFmpeg explicitly differentiate each bitdepth as separate internal format). One more question: if active metadata is applied to modify decoded picture, is the scaling the same (skipped)?

shekh avatar Nov 01 '17 12:11 shekh

Interesting, so for 444 decodes to 12-bit, the upscale to 16-bit would be: *dst++ = (r<<4)| (r>>8); *dst++ = (g<<4)| (g>>8); *dst++ = (b<<4) | (b>>8);

For 8-bit input to 12-bit encodes: *dst++ = (r<<4)| (r>>4); *dst++ = (g<<4)| (g>>4); *dst++ = (b<<4) | (b>>4);

Changing inputs before encoding is really weird, but I understand the logic. There would need to new tuning for DC offsets, but worth the experiment. Still expect 8-bit in to 8-bit out will decrease in PSNR, can;t know until trying.

dnewman-gpsw avatar Nov 01 '17 15:11 dnewman-gpsw

If Active Metadata was used, the would be more compute expensive as it involves additional buffer copies and the use of multiples rather shifts. CDL values for RGB Gain would be 1.00391 to take 0xff00 to 0xffff.

dnewman-gpsw avatar Nov 01 '17 16:11 dnewman-gpsw

Multipliers is a bad idea, IMHO, since it would introduce rounding errors and you would not be able anymore to recover the "original" data. Filling the lower bits with the upper bits from the same value is a common thing to do in image applications. It makes sure you can retrieve the original data and it makes sure you get full-scale output when the input was also full-scale (or zero when input was zero).

But I'd vote against changing anything. Just add a comment to the header that defines the CFHD_PixelFormats (i.e. CFHDTypes.h) saying that on decode only the upper 12 or 10bit contain valid data. That way the user can decide how to expand them to true 16bit. Some people might want to convert the output to float anyway, so any extra ORing-in of lower bits is wasted processing time if you can instead just use the proper multiplier to turn it into float directly.

olafmatt avatar Nov 02 '17 11:11 olafmatt

I am trying to verify something. I made white rectangle and saved it as rgba using rgba32 source format. On decoding to B64A it is seen as: r,g,b=0xFF00, a=0xFFFF. But when I apply active metadata (framing: vertical offset) it becomes r,g,b=0xFFDF, a=0xFFE0. I don't understand what happens.

shekh avatar Nov 24 '17 12:11 shekh

I can not fully confirm this, but I do see also some strange stuff going on.

Edit: Spoke to soon...: Just made two files encoding pure white (0xFFFF for all channels) and both with quality set to High or Filmscan1 the file decode to 0xFFF0 RGB and 0xFFFF for A. A file encoded as just RGB has the alpha channel set to 0xFFF0 when decoded to RGBA.

But since we're dealing with lossy compression, I would somehow understand when not all values come out at exactly the same numerical value they went in. But seeing something higher than 12bits looks somehow wrong indeed.

I have a feeling that creating a new decoder will not give you a fully initialized decoder. Some month ago (before this was open-sourced) I was writing a plugin for a commercial application where we reuse a limited set of decoders for whatever clip the user might be playing. So sometimes a decoder that was previously used on an RGB encoded clip would then be used for an YUV clip and so on. This often gave YUV frames that were too light. Only solution we found was to destroy the decoder and create a new one when switching between RGB(A) and YUV clips. Similar stuff with the encoder... try to first encode YUV (from YU64 format) and then try to use the same encoder (after a CFHD_PrepareToEncode(), of course) to make RGB from RG64 format. You'll get something that looks like RGB 4:2:2.

olafmatt avatar Nov 24 '17 14:11 olafmatt

CFHD_PrepareToEncode() must be call between any format change, so that is not surprising.

shekh, Once you enable active metadata with any change, it does go through more processing than just a vertical offset, but that error seems more pronounce than I would have expected. That will take some effort to track down.

dnewman-gpsw avatar Nov 24 '17 16:11 dnewman-gpsw

There seems to be one more thing wrong with alpha channel encoding. It seems the alpha channel gets a curve applied on encode that maps the 0-4095 range into 256-3823. Then it excluded 0 and "very high" values from being run through that curve. Presumably in order to make sure that even with rounding errors (i.e. lossy compression) fully transparent and fully opaque always come out of the decoder undamaged.

However, the condition for applying the curve is this (in ConvertBGRA64ToFrame_4444_16s() and similar functions in frame.c):

if(a > 0 && a < (255<<4))

I'd say this should be changed to:

if(a > 0 && a < 4095)

Otherwise all alpha values from 4080 on will come out of the decoder as 4095.

This curve applied to the alpha channel also explains why identical values for R, G ,B and A show different "rounding errors" for RGB and A when being encoded and decoded again.

Also in frame.c there are cases where decoding from YUV or RGB into an RGBA output format causes the alpha channel being filled with 0xffff (instead of taking the max. value for the respective bit depth). In case there is an alpha channel present, it removes the curve, shifts up by 4 bits and then clips max. value to 0xffff. See ConvertLowpass16sRGBA64ToRGBA64() in frame.c. Since 4095 got encoded without the curve applied, such a value would now get the curve "removed" which means we end up with a value above 4095. In both cases it should allow a max. alpha channel value of either 0xFFF0 or 0xFFC0, depending on bit depth.

Edit: There are also several cases in convert.c where the max. value gets limited to USHRT_MAX.

olafmatt avatar Aug 06 '18 17:08 olafmatt

I agree ConvertBGRA64ToFrame_4444_16s() and ConvertRGBA64ToFrame16s() should be using if(a > 0 && a < 4095)

Fixing that.

dnewman-gpsw avatar Aug 16 '18 18:08 dnewman-gpsw