[WIP] Add Color Filter Array (Bayer)
Work in progress, I would like to be sure there is an agreement about the path to use before adapting more things (quant_table_set_index_count...)
Since this is a new colorspace_type value, we can add it to FFV1 v0-v3 without breaking anything (old decoders would only reject the file with "colorspace not supported" message)
It depends on https://github.com/FFmpeg/FFV1/pull/89, only the last commit is part of this PR.
Link to document for YDgCoCg-R, which shows some theory and formulae.
Work in progress, I would like to be sure there is an agreement about the path to use before adapting more things (quant_table_set_index_count...)
good that you ask. First, before the specification or draft can be changed there needs to be research and testing of compression algorithms to compress bayer. These results should be publically discussed on CELLAR and then we can decide what to do, possibly more algorithms would be suggested implemented and tested for speed, complexity and compression rate. Once all that is done, the best option could then be added to the specification. If i didnt miss some communication on this then i dont think we are at the stage where we can make any changes in the specification.
testing of compression algorithms
The idea here is actually to not change the compression algorithm (on purpose): the suggested transformation is similar to what already exists with RGB (the "only" addition is a Dg, Difference Green, with preliminary tests about compression already done showing similar compression compared to merge green ), and reuse all other parts of the current specification (so no new algorithm, change in the encoder and decoder is limited to the count of components to handle and the transformation layer). This is an easy step, both in bitstream specification and encoders/decoders, for supporting a different "color space".
So speed is similar to colorspace_type, additional complexity near 0, and compression rate similar to RGB.
In my opinion there is a difference between coder_type (place for new compression algorithm to be tested) and colorspace_type (place for new transformations), and I suggest to separate both issues: here the idea is to not add any new coder_type more suitable to some specific color space but add a new colorspace_type.
My analysis:
There is a paper indicating better encoding after YCoCg transformation http://ftp3.itu.ch/av-arch/jvt-site/2003_09_SanDiego/JVT-I014r3.doc
There is a paper indicating nearly no difference in size between GBBR and GBR after compression https://pdfs.semanticscholar.org/e9ec/fac76723f0e5a3009ea3cf27584bf388e7a0.pdf
I didn't find any other paper on Bayer to Ysomething lossless transformation and I don't have any other idea (using modified JPEG-2000 RCT? I have doubts it would be better as YCoCg is reputed nowadays more suitable than the older JPEG-2000 RCT)
The cost in R&D for finding a potential better transformation seems high compared to the potential gain when we read the available papers.
The suggested addition has exactly the goal to not add any new compression algorithm in order to not increase the complexity (only few lines of code for the transformation algorithm and increasing planes count from 3 to 4) of the encoder/decoder when adding Bayer support.
This does not prevent to study a better coder_type potentially more adapted to such color space, in a separate work.
I think it is better to support Bayer with an additional complexity near 0 today than waiting for a potential better algorithm, knowing that adding this colorspace_type does not block any implementation of better coder_type when found.
In summary, I think it is better to have a basic support now at very low (near 0) cost than nothing due to high cost of R&D nobody is ready to pay.
I already put a place holder on CELLAR: https://www.ietf.org/mail-archive/web/cellar/current/msg01398.html , I can put a copy of this text on CELLAR if you prefer.
Note: the priority for me is to be able to send the current spec to IETF standardization, so this patch and this debate have a very lower priority than all other PRs related to clarification of what already exists. If there is a need to prioritize handling of PRs, please handle other PRs first.
you have a point that this is a low complexity solution. But i disagree that doing better has high R/D cost also this is the wrong place to discuss it, clearly belongs to CELLAR so ill try to reply there. About priority, iam very interrested in extending ffv1 to bayer and other features, i must admit that the v3 standarization work is a bit boring compared to that :)
i must admit that the v3 standarization work is a bit boring compared to that :)
It is not fun for me either, but it is a blocker for me for working on new features.
A decoder rejects the stream if unknown coder or colorspace so no need of version bump here IMO. I guess we need to decide what means "reserved" but IIRC e.g. h264 does same for reserved stuff (stream is just not decoded), I am in favor to not increase version number for reserved values we decide to use but it should be a global debate for all reserved values (then I adapt this PR)
@JeromeMartinez ok with me. Still some hanging comments from https://github.com/FFmpeg/FFV1/pull/100#discussion_r167399455 and a rebase is needed for review.
Added "Inferred to be 0 if not present." for chroma_subsample. Rebased. But still on hold due to the need of test files and test code.
What is the status of this?

On my side lack of sample files for any demonstration. And now thinking to JPEG-2000 RCT + green difference, would be simpler in v3 code. YCoCg could come later (for normal RGB too)
Files should be available on the websites of the sensor and camera manufacturers. (I can send you some as soon as I will be back at the lab.)
Finally got some files, directly supported by FFmpeg. Unfortunately only one kind of source ("Magic Lantern" MLV files, from different Canon cameras), but IMO enough for having a first version of the support. ~10 MLV RGGB files (lifetime 1 week, I'll find a more suitable place soon). Note: they are 10/12/14-bit content, but FFmpeg supports only 16-bit RGGB, so I work on upscaled content for the moment. IMO not a blocker for the specification and code in FFmpeg.
I also would like to prioritize an effective support with small changes at short term versus lot of changes (based on studies of algorithms performance etc) at long term, so the new proposal keeps most of the internal structure of current FFV1 v0-v3:
- Keeping "useless" (for this input format) metadata like chroma subsampling, similar to what was done for RGB,
- Keeping the same color transformation than RGB (JPEG2000-RCT),
- Permitting only 2x2 pattern dimension, as current code and specs permit only up to 4 planes.
This permits minimal changes in FFmpeg code for v0-v3, in order to have an immediate working implementation, and we could work on having a better support with common optimizations for YUV/RGB/RGGB in v4 (see below).
Not reinventing the wheel, metadata for RGGB support are based on how TIFF/EP & EXIF (direct view of CFAPattern) add the support of RGGB:
- Using now more generic name CFA = Color Filter Array.
- Using the same kind of metadata than
CFAPatternin TIFF/EP.
Notes:
- TIFF/EP has
CFARepeatPatternDimfor pattern dimension and EXIF includes it inCFAPatternbut FFMpeg code would be more complicated with something not 2x2 so I don't put it in at least this version of the spec, I can add similar fields in FFV1 if preferred. - CR2 uses CR2CFAPattern with an index (1 = RGGB, 2 = BGGR, 3 = GBRG, 4 = GRBG), but it is less versatile (CR2CFAPattern can be transformed to CFAPattern, inverse is not always true) and less standardized so I preferred to use the TIFF/EP method even if it is a bit more verbose.
- I don't do a difference between FFV1 versions, as it is a new
colorspaceitem, all FFV1 versions could have this newcolorspacevalue and I see no reason to limit that to a version, as we have acolorspacefor that (FFmpeg rejects such stream in any version). If preferred, I add it to v3 only. - R, G, B are numbered (0, 1, 2) as in TIFF/EP, EXIF, DNG.
Beside the spec update, I implemented the support of RGGB in FFmpeg: FFmpeg patch for FFMPEG FFV1 RGGB support. As you can see, it is pretty simple and does not make the code more complicated or slower for other pix_fmt. If we agree on this spec, I update it with other CFA patterns and coherency tests (with/height must be multiples of 2, test of chroma / transparency values...), and I add some text about "Color Filter Array" ("TODO" for the moment, I would like that we agree on the bitstream before I write the text about it)
That said, I remarked during my tests that not using JPEG2000-RCT provides a better compression (~10% better) with the samples I have (which are not representative). I also ran some tests on RGB (classic) files, and remarked that with some RGB files this is same (1~10% better without JPEG2000-RCT) when I hack the FFV1 code in FFmpeg for forcing no JPEG2000-RCT with RGB content.
In my opinion using or not using JPEG2000-RCT is a more global issue not related to RGGB support, and should be a item to discuss during v4 about what to do (transform, no transform, for the whole stream or by slice; I actually already see some code about that in the v4 draft), so we should here just use current code (so JPEG2000-RCT) without complicated additional stuff (because it is the same bitstream versions and we just add a new colorspace), and debate about JPEG2000-RCT relevance during v4, having a common code between RGB and RGGB about transformation.
Rebased + updated based on comments from Dave.
Also I changed the title, as the previous title was no more relevant with the proposed changes.
That said, I remarked during my tests that not using JPEG2000-RCT provides a better compression (~10% better) with the samples I have (which are not representative). I also ran some tests on RGB (classic) files, and remarked that with some RGB files this is same (1~10% better without JPEG2000-RCT) when I hack the FFV1 code in FFmpeg for forcing no JPEG2000-RCT with RGB content.
In my opinion using or not using JPEG2000-RCT is a more global issue not related to RGGB support, and should be a item to discuss during v4 about what to do (transform, no transform, for the whole stream or by slice; I actually already see some code about that in the v4 draft), so we should here just use current code (so JPEG2000-RCT) without complicated additional stuff (because it is the same bitstream versions and we just add a new
colorspace), and debate about JPEG2000-RCT relevance during v4, having a common code between RGB and RGGB about transformation.
10% compression gain by not doing a step, sounds like a good idea. We do design a new standard here with Bayer support and if we do not want to do it properly then we should NOT do it. Also skiping a step does not sound like it should make things more complex
About the two green planes. IIUC the proposal does store "even" and "odd" green samples in 2 different planes. That would seperate the most correlated samples of the green plane, or am i misunderstanding ? What other ways have been tried to split the too large green plane so it can be "hacked" into the smaller green and alpha plane ? (if that is the goal) There are many ways by which a plane can be split into 2 smaller ones, I think keeping highly correlated samples together should help compression
There are many ways by which a plane can be split into 2 smaller ones, I think keeping highly correlated samples together should help compression
I fully agree.
We do design a new standard here with Bayer support
Yes and no: we can also consider that we reused most of the current standard. It is compression ratio vs additional complexity. Do I understand well that you would accept more change in spec and FFmpeg code for this support?
Also skiping a step does not sound like it should make things more complex
Issue is that I have a poor set of sample files. And my classic RGB files also have better compression without JPEG2000-RCT but you implement JPEG2000-RCT for a reason I guess, so I have reluctance to consider that my samples are exhaustive. In my opinion we should work on an option for JPEG2000-RCT for all color spaces including RGB one, this means to break the bitstream so next version. Anyway, if we move to another solution (see below), we can't use JPEG2000-RCT so no more need of this part of the discussion.
There are many ways by which a plane can be split into 2 smaller ones, I think keeping highly correlated samples together should help compression
I can propose a solution "merging" both greens, but it makes the bitstream changes and code bigger, I am OK for trying to implement that but please in that case do not reject the changes due to too many changes and/or not enough samples.
There are many ways by which a plane can be split into 2 smaller ones, I think keeping highly correlated samples together should help compression
Just to be clear: I do not split a plane in 2 smaller ones, the CFA input is with 2 green "planes". So only one "way" to split is used, the one from the CFA hardware.
That would seperate the most correlated samples of the green plane, or am i misunderstanding ?
I understand that you think to merge in some way first and second greens. A prejudice would be that first and second greens are correlated so better compression, but after tests merging G1 and G2 planes, compression is worse (same outcome than trying with G difference: I would have expected that coding G difference would improve compression, not the case). Maybe due to the offset both vertically and horizontally, so the prediction does not work well? Code for merged G2 then G1, to be used only for tests, as I hardcode RGGB (will do clean code when the spec is decided).
With:
+----+----+
| R | G1 |
+----+----+
| G2 | B |
+----+----+
I tested with my sample files:
- R, G1, B, G2 in their own plane (current PR), without transformation
- R, G1 then G2, B, 3 planes, "G1 then G2" being 2x bigger than others: worse 0.5-2.8% (average worse 1.5%)
- R, G2 then G1, B, 3 planes, "G1 then G2" being 2x bigger than others: between better 0.5% and worse 2.5% (average worse 1.0%)
One advantage of this scenario (merged G1 & G2) is that we keep the 4th plane for the transparency, but compression is worse (and if we want to keep transparency support, we can just decide that we can accept up to 5 planes).
I think I tested all ideas now.
My personal preference is to use the implementation which will be accepted in v3 (and v0/1?) spec before the standard is released, so please vote for your preference based on my tests (which are not exhaustive about sample files, but this is all I have right now) and I'll implement it, I see decision to take on the following parts ("permit both" = a flag in the header): 1a/ use 1 plane for G1 and 1 plane for G2 info, 1b/ use 1 plane for G1&G2, 1c/ permit both, 2a/ use no transformation, 2b/ use JPEG2000 RCT (if 1a), 2c/ permit both 3a/ use G2 as is, 3b/ use G2-G1 (if 1a), 3c/ permit both 4a/ use alpha_plane for storing G2 info so we keep 4 planes maximum, 4b/ permits up to 5 planes.
If I have to choose, I would choose 1a (trying to avoid code complexity), 2c (not complex to have both), 3c (not complex to have both), mixed feelings about 4 (no RGGBA pix_fmt in FFmpeg or other tools, vs it does not harm to specify alpha plane without implementation, in case of future need so in practice implementations keep 4 planes up to a need of support of transparency)
We do design a new standard here with Bayer support
Yes and no: we can also consider that we reused most of the current standard. It is compression ratio vs additional complexity. Do I understand well that you would accept more change in spec and FFmpeg code for this support?
yes
There are many ways by which a plane can be split into 2 smaller ones, I think keeping highly correlated samples together should help compression
Just to be clear: I do not split a plane in 2 smaller ones, the CFA input is with 2 green "planes". So only one "way" to split is used, the one from the CFA hardware.
If the spectral response for the samples is the same then there is one plane. If the spectral response is not the same then one is not green. So there is only one green plane. what any API or hardware gives us is irrelevant, for all we know it could give us the samples in 8 planes of 2 bit per sample. That wouldnt result in us wanting to store 8 planes
That would seperate the most correlated samples of the green plane, or am i misunderstanding ?
I understand that you think to merge in some way first and second greens. A prejudice would be that first and second greens are correlated so better compression, but after tests merging G1 and G2 planes, compression is worse (same outcome than trying with G difference: I would have expected that coding G difference would improve compression, not the case).
There are many ways to subtract the planes, if this was attempted then it would be kind of storing the difference for each pixel based on its prediction from the other plane. For this all pixels of the first plane and also the red and blue planes might be usable to predict the 2nd green one. The same can be done with other planes itself too. This alone is probably weeks or months full time work if done and tested properly. Also such prediction system is not specific to bayer RGGB but could be done with anything except grayscale
Maybe due to the offset both vertically and horizontally, so the prediction does not work well? Code for merged G2 then G1, to be used only for tests, as I hardcode RGGB (will do clean code when the spec is decided).
With:
+----+----+ | R | G1 | +----+----+ | G2 | B | +----+----+I tested with my sample files:
- R, G1, B, G2 in their own plane (current PR), without transformation
- R, G1 then G2, B, 3 planes, "G1 then G2" being 2x bigger than others: worse 0.5-2.8% (average worse 1.5%)
- R, G2 then G1, B, 3 planes, "G1 then G2" being 2x bigger than others: between better 0.5% and worse 2.5% (average worse 1.0%)
One advantage of this scenario (merged G1 & G2) is that we keep the 4th plane for the transparency, but compression is worse (and if we want to keep transparency support, we can just decide that we can accept up to 5 planes).
I think I tested all ideas now.
Try storing the green in diagonal scans. If that makes no sense, take a paper with RGGB raster on it and rotate it by 45° you will see that all the green samples align in a nice raster.
This alone is probably weeks or months full time work if done and tested properly.
This is something I can not afford for the moment, I suggest that we find something intermediate, doing some tests in order to find a good ratio between time spent and compression. Wanting the best compression now will just have empty result as outcome, I think it is better to have something "good enough" than nothing, it is better IMO to demonstrate we already do not too bad compression instead of nothing for finding funding for improving compression.
With my sample files, stream size is divided by 2 to 3, which is already very useful compared to just nothing as it is now.
Try storing the green in diagonal scans. If that makes no sense, take a paper with RGGB raster on it and rotate it by 45° you will see that all the green samples align in a nice raster.
Got it. Beside more code (you said it is fine so OK, in that case width changes every line, not too much difficult to handle but more complex code), and as we read memory in a "random" (not sequential e.g. 0,3 then 1,2 then 2,1 then 3,0) I am afraid about some memory speed impact (maybe low compared to a potential compression gain).
Can we have this deal: I test (crappy but working FFmpeg code for coder and decoder, same framemd5 input and coder-decoder output) compression ratio with 45° rotation for green, on the sample files I have, then based on results we choose between the different options listed in this discussion?
If I find enough time for this, I am planning to pass our implementation to Carl Eugen next week in London. We already discussed it briefly at the Video Dev Days one month ago in Paris. (I am afraid, my health is too weak for the toxic and unwelcoming environment inside FFmpeg.)
This alone is probably weeks or months full time work if done and tested properly.
This is something I can not afford for the moment
This can be reduced by applying techniques derived from the so-called artificial intelligence (to me it’s one of the rare cases where it makes sense to use it). I guess, I mentioned that en passant at No Time to Wait last year in Vienna. It’s exactly what we did when we implemented Bayer on our side, but we did only what was actually useful in production. I do not have the money to pay for fundamental research…
To do it in a systematic way could possibly be an interesting project for the next G**gle Summer of Coding.
To do it in a systematic way could possibly be an interesting project for the next G**gle Summer of Coding.
I think you overestimate the abilities and interrest of students. As someone who mentored gsoc students in the past, i think very few of them are capable and "interrested" in doing this. Also ive tried this with adding motion estimation and compensation to ffv1. The student did something that "works" but its not usable in its current form. So unless we find a 1 in 100 student for this, I suspect that doing this in gsoc would be more or a similar amount of work for the mentor with a student than without a student. And that would defeat the purpose. But if we do find a 1 in 100 student who wants to work on ffv1 and other mentors/admins dont object. Its certainly a good idea to accept her/him in some ffv1 related task, still i think theres much more in ffv1 that could use man hours than there are highly qualified and motivated students available. What iam trying to say is we shouldnt expect this to become a gsoc project but rather keep our eyes open for other solutions and IF there is an opertunity in gsoc, that accelerates something in ffv1 then thats even better but i wouldnt hold my breath here ...
This alone is probably weeks or months full time work if done and tested properly.
This is something I can not afford for the moment, I suggest that we find something intermediate, doing some tests in order to find a good ratio between time spent and compression. Wanting the best compression now will just have empty result as outcome, I think it is better to have something "good enough" than nothing, it is better IMO to demonstrate we already do not too bad compression instead of nothing for finding funding for improving compression.
I dont think the plane prediction is tied to bayer. This is a valuable and time consuming thing to work on but i dont think its blocking for this here. We can continue without plane- plane prediction and maybe add a placeholder field in the header to select other predictions
Try storing the green in diagonal scans. If that makes no sense, take a paper with RGGB raster on it and rotate it by 45° you will see that all the green samples align in a nice raster.
Got it. Beside more code (you said it is fine so OK, in that case width changes every line, not too much difficult to handle but more complex code), and as we read memory in a "random" (not sequential e.g. 0,3 then 1,2 then 2,1 then 3,0) I am afraid about some memory speed impact (maybe low compared to a potential compression gain).
It should be possible to limit the impact by reordering the samples in blocks to maximize cache use before encoding / after decoding the lines. But thats something for the future, its to early to optimize this i think
Can we have this deal: I test (crappy but working FFmpeg code for coder and decoder, same framemd5 input and coder-decoder output) compression ratio with 45° rotation for green, on the sample files I have, then based on results we choose between the different options listed in this discussion?
I think design choices for "IETF standards/drafts" are not made by such deals ;) Rather we should try to come up with the best "simple" solution which leaves it open to improve it in the future, like for example with better predictors for the planes. "simple" as it seems we dont have any volunteer to do a "best" one. But we should leave the path to "best" open and maybe after we did some steps toward "simple" someone will enjoy the work so much that he just continues and we move a bit over the simplest and towards "best" before we release the design in the spec. We will see, but yes i agree we should try to get something done and into the actual specifcation for bayer support
I think you overestimate the abilities and interrest of students.
Possibly. (I was just thinking that AI is currently a hype.)
Try storing the green in diagonal scans. If that makes no sense, take a paper with RGGB raster on it and rotate it by 45° you will see that all the green samples align in a nice raster.
Indeed. This is the same we do when digitising Dufaycolor (in this case by rotating by ca. 23° the sensor, in order to obtain horizontal and vertical aligned «pixels»). Of course, it also reduces aliasing.