quicktime_video_hack icon indicating copy to clipboard operation
quicktime_video_hack copied to clipboard

Questions about clock refs, audio/video sync and timing in general

Open j-santander opened this issue 3 years ago • 1 comments

This should probably not be an issue, as they just a few observations and questions. I wonder if you have more information on the handling of clocks and timing (specially the samples PTS).

From what I've gathered, 6 clock refs appear in the messages. To tell the truth I'm not entirely sure what is a "clock ref", so probably that's the first question.

So far I'm assuming that they're unique identifier (0x1 indicating an invalid reference) that represent a source of timing information. Messages in the exchange seem to be "addressed" to a clock_ref and in some cases there seem to be timing information sent.

Clock_Ref Defined in Message addressed to
DEVICE_AUDIO_CLOCK CWPA HPA1, HPA0
DEVICE_VIDEO_CLOCK CVRP NEED
DEVICE_CLOCK TBAS
LOCAL_AUDIO_CLOCK RPLY to CWPA AFMT, EAT!, GO!, SKEW, STOP
LOCAL_VIDEO_CLOCK RPLY to CVRP CLOK, FEED, RELS, SPRP, SRAT, TBAS, TJMP
LOCAL_CLOCK RPLY to CLOK RELS, TIME
  • LOCAL_AUDIO_CLOCK is used in the SKEW computation.
  • LOCAL_CLOCK is used in the TIME RPLY.

Regarding the PTS:

  • Video PTS: This is related to the time we sent in the RPLY to TIME message, but the first PTS is earlier than the CMTime value that we sent (note: in gst_adapter.go you have a FIXME where you talk of a weird large timestamp.... it's a negative time). Basically it seems that:
    • We're requested a TIME and we provide a value, I guess our current time.
    • Then we have an SRAT (I guess device's current time, synchronized with what we provided.
    • Then we get a new TIME and we provide our new value.
    • Finally comes the first FEED message with a PTS earlier than the first TIME we provided...
  • Audio PTS: This seems to come always starting from close to 0 (2052/48000) and independent of other clocks.

Now, it seems that Audio PTS and Video PTS work differently. Quick Time seems to assume that first audio sample and first video sample are in the same PTS (and if you look at the MOV produced from a capture both streams start in CTS=0). I'm not sure how it solves when we start recording with a blank screen in the device (where we do receive EAT! but we don't receive FEED).

One final observation. The device does not keep the promised 60 FPS, in many cases it just skip one, two or event three frames. The duration of the frame is still 1/60, but the PTS are correct. Sending frequent NEED messages does not help (but if you send NEED less frequently you will get less frequent frames).

In general, when you start, the first burst of FEED you will get at the right rate (even without sending NEED), after that it is likely that you will get less than the 60 FPS.

j-santander avatar May 29 '21 19:05 j-santander

I haven't actually looked into their decompiled code, so I can only guess myself. But I came to the same conclusion as you did, I think the clock_ref is just a unique ID pointing to a CMClock instance

Interesting for the FPS part! Thanks for these observations. I will add them to my documentation. In general I also wanted to review my implementation of timing in general. I think I have especially the Clock Skew part wrong.

danielpaulus avatar Jun 09 '21 07:06 danielpaulus