quicktime_video_hack
quicktime_video_hack copied to clipboard
Questions about clock refs, audio/video sync and timing in general
This should probably not be an issue, as they just a few observations and questions. I wonder if you have more information on the handling of clocks and timing (specially the samples PTS).
From what I've gathered, 6 clock refs appear in the messages. To tell the truth I'm not entirely sure what is a "clock ref", so probably that's the first question.
So far I'm assuming that they're unique identifier (0x1 indicating an invalid reference) that represent a source of timing information. Messages in the exchange seem to be "addressed" to a clock_ref and in some cases there seem to be timing information sent.
Clock_Ref | Defined in | Message addressed to |
---|---|---|
DEVICE_AUDIO_CLOCK | CWPA | HPA1, HPA0 |
DEVICE_VIDEO_CLOCK | CVRP | NEED |
DEVICE_CLOCK | TBAS | |
LOCAL_AUDIO_CLOCK | RPLY to CWPA | AFMT, EAT!, GO!, SKEW, STOP |
LOCAL_VIDEO_CLOCK | RPLY to CVRP | CLOK, FEED, RELS, SPRP, SRAT, TBAS, TJMP |
LOCAL_CLOCK | RPLY to CLOK | RELS, TIME |
- LOCAL_AUDIO_CLOCK is used in the SKEW computation.
- LOCAL_CLOCK is used in the TIME RPLY.
Regarding the PTS:
- Video PTS: This is related to the time we sent in the RPLY to TIME message, but the first PTS is earlier than the CMTime value that we sent (note: in gst_adapter.go you have a FIXME where you talk of a weird large timestamp.... it's a negative time). Basically it seems that:
- We're requested a TIME and we provide a value, I guess our current time.
- Then we have an SRAT (I guess device's current time, synchronized with what we provided.
- Then we get a new TIME and we provide our new value.
- Finally comes the first FEED message with a PTS earlier than the first TIME we provided...
- Audio PTS: This seems to come always starting from close to 0 (2052/48000) and independent of other clocks.
Now, it seems that Audio PTS and Video PTS work differently. Quick Time seems to assume that first audio sample and first video sample are in the same PTS (and if you look at the MOV produced from a capture both streams start in CTS=0). I'm not sure how it solves when we start recording with a blank screen in the device (where we do receive EAT! but we don't receive FEED).
One final observation. The device does not keep the promised 60 FPS, in many cases it just skip one, two or event three frames. The duration of the frame is still 1/60, but the PTS are correct. Sending frequent NEED messages does not help (but if you send NEED less frequently you will get less frequent frames).
In general, when you start, the first burst of FEED you will get at the right rate (even without sending NEED), after that it is likely that you will get less than the 60 FPS.
I haven't actually looked into their decompiled code, so I can only guess myself. But I came to the same conclusion as you did, I think the clock_ref is just a unique ID pointing to a CMClock instance
Interesting for the FPS part! Thanks for these observations. I will add them to my documentation. In general I also wanted to review my implementation of timing in general. I think I have especially the Clock Skew part wrong.