h264-reader convenience methods?

convenience methods?

Open scottlamb opened this issue 3 years ago • 10 comments

What are your feelings on additional convenience methods / methods that interpret the raw fields? I find the H.264 spec rather dense and want to get a few "simple" things without studying it in detail. I imagine other folks will be in the same boat.

Here are some I'd like to have and why I think it's better for the crate to compute them than for callers to do so (badly):

rfc6381_codec. I see that your lowly crate calculates this here this. My moonfire-nvr calculates it here. Seems a little obscure/annoying.
SeqParameterSet::pixel_dimensions()Looks like it's specified in terms of macroblocks and map units. I'm unsure if you're supposed to apply cropping parameters to support frames that aren't a multiple of those things.
TimingInfo: max frame rate (or maybe min time scale units per frame?) I'd have expected this to be super straightforward but I got confused about mentions of interlaced video and frame doubling/tripling, references to a variety of equations with abbreviated names, etc.
max bitrate. Also given in terms of macroblocks / map units. And I'm confused about the NAL vs VCL HRD parameters.

Jul 13 '20 03:07 scottlamb

Sounds great!

For rfc6381, it would be great if there was a separate crate for this; I have other places I could use this. Was thinking something providing an enum Codec { Avc1 { ... }, etc } with appropriate impl ToString/TryFrom. If you want to see that, I can set something up.
For dimensions, you are right about cropping, and I understand pixel format / interlacing also plays a part. I think the algorithm lowly uses came about through combination of reading forum posts and cross-referencing the spec. Not well tested yet, but I can move the code to this crate.
the timing model is something I've not quite got worked out. I'm sure I've got code somewhere with frames = frames * 2; // TODO: why do we need this :sweat_smile: .
I have not seriously looked at the bitrate stuff; it looked hard so I tried to make do with mpegts-level signalling :sweat:

Think I can make some quick additions for the first two; for the second two I'd be happy to add helpers, but can't commit to researching the proper implementations right now.

Jul 13 '20 08:07 dholroyd

I hope 44ebdffdb82b029b6f718533face733e79ac8586 addresses the second item in the list (but feel free to propose adjustments).

Jul 13 '20 09:07 dholroyd

Thank you!

For rfc6381, it would be great if there was a separate crate for this; I have other places I could use this. Was thinking something providing an enum Codec { Avc1 { ... }, etc } with appropriate impl ToString/TryFrom. If you want to see that, I can set something up.

That'd be awesome, thank you!

(On the subject of other codecs and crates: do you have any plans for H.265? no pressure.)

I hope 44ebdff addresses the second item in the list (but feel free to propose adjustments).

Thanks!

I'll start looking into the second two then.

I'm already having second thoughts about bitrate. It looks like unfortunately even defining an interface for it is a bit of a mess because there's not just one result:

First, there's multiple CPBs. I'm not quite sure why yet; maybe because of scalable video coding.

Then there's VUI vs NAL. I haven't found an explanation in the spec itself yet, but I found this snippet on someone's website:

The difference between NAL HRD and VCL HRD is whether SPS/PPS/SEIs are counted or not in HRD calculations. In NAL HRD mode all non-VLC NALs (i.e. SPS, PPS, SEIs) as well as VCL NALs (i.e. Slices) are counted in HRD while in VCL HRD mode non-VCL NALs are ignored (consequently in VCL HRD mode access unit may be smaller than that in NAL HRD).

In my opinion the reason for VCL HRD mode: in some applications SPS/PPS are given a priory or signaled out-of-band, therefore there is no sense to include these NALs in HRD process.

Of course it's optional to specify HRD parameters. There's also a max allowable one defined by the level and profile. One of the CPBs has to be within that max (see A.3.1.i).

Jul 13 '20 15:07 scottlamb

Here's the initial sketch of codec support.

My thinking is that with a dependency on this new crate, we can then extend the Sps type like,

    fn rfc6381(&self) -> rfc6381_codec::Codec {
        rfc6381_codec::Codec::avc1(self.profile_idc.0, self.constraint_flags.0, self.level_idc)
    }

so to get the string you want it should be sps.rfc6381().to_string() or format("{}", sps.rfc6381())

Jul 13 '20 22:07 dholroyd

do you have any plans for H.265?

I understand that a lot of the infrastructure like NALU syntax is the same, so yeah I'd like to make it easy to either support H.265 in this crate, maybe renamed (h26x-reader), or separate the generic stuff into third crate (h26x-reader + h264-reader + h265-reader).

I've not really got any h265 material passing through my hands at the moment to motivate work on that, but hopefully that'll change at some point!

I raised #10 to track that thought.

Jul 13 '20 22:07 dholroyd

I hope 44ebdff addresses the second item in the list (but feel free to propose adjustments).

I notice the Chromium codebase takes care to avoid integer overflow in the equivalent spot. Sorry for the churn, but pixel_dimensions() should be changed to return Result.

Aug 02 '20 21:08 dholroyd

I'm making progress on my pure-Rust RTSP client so taking a fresh look at this stuff.

rfc6381_codec: that proposed API looks nice, thanks!
- a nit with the new crate: I realized recently that the hex digits after avc1 are supposed to be uppercase, see avcoti in the RFC 6381 syntax. Not sure if anything actually cares.
- On the subject of constraint_flags.0, could that just be public? Then you could also construct an AVCDecoderConfigurationRecord without jumping through hoops to get this byte (which ISO/IEC 14496-10 section 5.2.4.1.1 calls profile_compatibility)
- Aside: It might also be nice to have a convenience method to construct an AVCDecoderConfigurationRecord but I'm not exactly sure what I want yet. It looks like I need to handle both out-of-band and in-band parameter sets, including them updating mid-session as mentioned in RFC 6184 section 8.4, and I'm not exactly sure how I'm supposed to handle multiple SPS/PPSs either.
SeqParameterSet::pixel_dimensions(): I'm using your calculation. So far so good!
TimingInfo: I'll take a look at this sooner or later.
max bitrate: I've given up on this. In addition to being complicated to compute, it looks like it's a useless overestimate, see here.

May 02 '21 06:05 scottlamb

Regarding rfc6381_codec, you've reminded me of a pending change that I've just committed, adding dependencies on mpeg4-audio-const and mp4ra-rust crates, which define the constants which rfc6381_codec had just defined locally before. Hopefully this does not seem like over-engineering things! :sweat_smile:

May 03 '21 21:05 dholroyd

It might also be nice to have a convenience method to construct an AVCDecoderConfigurationRecord

Any existing crate/type you had in mind? I assume it doesn't make sense to define a type local to h264-reader since it wouldn't interoperate with any existing MP4 writer.

May 13 '21 22:05 dholroyd

I'm not aware of anything. I was thinking of just being able to get the raw bytes as a Vec<u8>, but yeah in my dream world there'd be a broadly used non-codec-specific, non-application-specific video and audio parameters crate. My high-level RTSP crate isn't the best place for it, but I'm working toward making something like there which can produce:

the raw bytes to give to eg ffmpeg as "extra data" if you want to actually decode the stream, in AVC's case the AVCDecoderConfigurationRecord
likewise, something to help produce an ISO BMFF sample entry box. The above is the contents of the avcC box, which is part of the avc1 box. I don't know exactly where to draw the line between codec parameters and .mp4 generation code, but I do know I sometimes want to modify the avc1 to add in a pasp when stupid camera companies don't properly set their VUI, so if the caller is given a whole sample entry box they may end up altering it anyway. [edit: I'll likely end up with a "quirks" parameter to my sample entry maker for stuff like this, but I don't think that really belongs in say h264-reader.]
high-level stuff similar to what I've been asking about here: pixel dimensions, pixel aspect ratio (if known), frame rate (if known), bitrate bounds (except they're too high to be useful anyway), maybe colorspace, etc.

This is the kind of stuff that I'd imagined the rust-av project doing but I'm not sure if it's gone anywhere. [edit: I probably should check in with them, but I just want to get something working right now.] Seems like the webrtc-rs stuff is also in adjacent space (they have their own RTP and RTCP crates now also).

btw, I'm also working on stuff for (de)muxing codecs from RTP packets to access units, which in theory might make more sense in your rtp-rs crate, with the caveats that (a) my code right now assumes no dropped RTP packets which probably will work for interleaved RTSP but not whatever other ways folks use RTP, and (b) I'm working with Bytes at the moment which may be more "opinionated" than you want.

fwiw, my (very much a work in progress) crate produces the AVCDecoderConfiguration here, although my code is sketchy (only handles a single SPS and PPS).

May 14 '21 00:05 scottlamb

h264-reader h264-reader copied to clipboard

convenience methods?

h264-reader
h264-reader copied to clipboard