h264-reader icon indicating copy to clipboard operation
h264-reader copied to clipboard

Handling of `rbsp_trailing_bits` is not robust

Open DCNick3 opened this issue 5 months ago • 3 comments

The H.264 spec defines rbsp_trailing_bits to be:

rbsp_trailing_bits( ) {
    rbsp_stop_one_bit /* equal to 1 */
    while( !byte_aligned( ) )
        rbsp_alignment_zero_bit /* equal to 0 */
}

In h264-reader this syntactic element is handled by the finish_rbsp function:

fn finish_rbsp(mut self) -> Result<(), BitReaderError> {
    // The next bit is expected to be the final one bit.
    if !self
        .reader
        .read_bit()
        .map_err(|e| BitReaderError::ReaderErrorFor("finish", e))?
    {
        // It was a zero! Determine if we're past the end or haven't reached it yet.
        match self.reader.read_unary1() {
            Err(e) => return Err(BitReaderError::ReaderErrorFor("finish", e)),
            Ok(_) => return Err(BitReaderError::RemainingData),
        }
    }
    // All remaining bits in the stream must then be zeros.
    match self.reader.read_unary1() {
        Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => Ok(()),
        Err(e) => Err(BitReaderError::ReaderErrorFor("finish", e)),
        Ok(_) => Err(BitReaderError::RemainingData),
    }
}

This function expects that SPS does not have any trailing data at the end and will crash with RemainingData if this is the case. The spec doesn't seem to mandate that this is the case though, and other decoder implementations, like ffmpeg will follow it and only consume zero bits up until the byte boundary, ignoring the rest.

Here's a file from ffmpeg's test suite that will fail to parse with h264-reader, but will otherwise play fine in libavcodec-based programs crew_cif.h264.zip. It has two extra bytes in the SPS: "\x00\x0a"

DCNick3 avatar Jul 24 '25 15:07 DCNick3

That's not consistent with my understanding of the spec.

For one, the following excerpt doesn't make sense to me if there can be 1 bits following the rbsp_trailing_bits.

more_rbsp_data( ) is specified as follows: ... the RBSP data is searched for the last (least significant, right-most) bit equal to 1 that is present in the RBSP. Given the position of this bit, which is the first bit (rbsp_stop_one_bit) of the rbsp_trailing_bits( ) syntax structure ...

For another, nal_unit does not define any bytes following rbsp_byte[...], and seq_parameter_set_rbsp doesn't define anything following the rbsp_trailing_bits (nor do any *_rbsp syntax elements AFAICT). So what are those \x00\x0a bytes in that example? I think they're actually an "end of sequence" NAL, which somehow was incorrectly stuck on the end of the SPS NAL. (Maybe some confusion along the path from a hardware encoder to a muxing application. Hard to say for sure without knowing where they got that file.)

In fact I think one could interpret the spec portion you've quoted even more strictly to say that the RBSP should not contain any trailing zero bytes. (Although Annex B encoding does allow/consume trailing zero bytes between NALs so in that case they'll never reach the RBSP parser anyway. But there are some places where NALs are not supposed to go through Annex B encoding—such as when AVCC length-prefixed encoding is used instead, inside RTP packets, and within SDP fmtp media attributes—where this distinction may be more relevant.)

That said, I hear the pragmatic argument that there's input that ffmpeg does something useful with that h264-reader produces an error with. This came up before in https://github.com/scottlamb/retina/issues/102 and I opted to implement a TolerantBitReader wrapper which warns on the first case of extra data but always proceeds. You can do the same in your own h264-reader using code. Whether h264-reader should include some such wrapper or always just be tolerant, I think is up to @dholroyd .

scottlamb avatar Jul 24 '25 16:07 scottlamb

Thanks for pointing to the relevant discussion!

I have actually prepared a PR for making h264-reader tolerant to such issues and will send it soon. Of course, whether to merge it is up to @dholroyd.

Although in my case the "extra" bytes seem to be due to user (my) error. The file is used by an ffmpeg unit test and is actually not an Annex B-formatted stream. Instead, it has NAL units prefixed with sizes as u16s, while also having start codes. ffmpeg, being very liberal with what it accepts, finds the start codes and ignores the rest.

DCNick3 avatar Jul 24 '25 18:07 DCNick3

fwiw, I'm definitely of two minds about accepting crappy input in general:

  • There's a lot of bad code/data that's out there that's not easy to change, and it's useful to be able to interoperate with it, so the old-school Postel's law applies. My Retina library explicitly says "Works around brokenness in cheap closed-source cameras" for this reason, and I've made many changes to it along those lines. For example, I wrote above about "places where NALs are not supposed to go through Annex B encoding" but actually Retina has added Annex B decoding to these one-to-one as I've discovered some camera that actually does embed multiple NALs via Annex B encoding in each of these places.
  • On the other hand, there can be security consequences to fudging inputs in some contexts, particularly if two different systems do it in a different way. And I think a lot of the crap that is out there exists because someone said "works with ffmpeg, good enough for me". If ffmpeg were stricter, we wouldn't have these problems. Similar story with web browsers accepting bad HTML, and with many other protocols/data formats. So it's nice to be able to be strict or to point out problems without actually failing.

scottlamb avatar Jul 24 '25 18:07 scottlamb