Fix timestamps across concatenated TSes
Problem When a playlist of TS assets is played, at the point of stream change the following TS may have one or more frame timestamps repeated. See this example with the last 5 frames of the previous TS and the first 5 frames of the next one. Current main:
onQueueInputBuffer timeUs=1000004000000 size=10647
onQueueInputBuffer timeUs=1000003920000 size=5095
onQueueInputBuffer timeUs=1000003880000 size=1630
onQueueInputBuffer timeUs=1000003960000 size=1728
onQueueInputBuffer timeUs=1000004040000 size=3313
onQueueInputBuffer timeUs=1000004000000 size=951 KEY
onQueueInputBuffer timeUs=1000004080000 size=45
onQueueInputBuffer timeUs=1000004040000 size=43
onQueueInputBuffer timeUs=1000004120000 size=132
onQueueInputBuffer timeUs=1000004160000 size=241
Correct sequence should be:
onQueueInputBuffer timeUs=1000004000000 size=10647
onQueueInputBuffer timeUs=1000003920000 size=5095
onQueueInputBuffer timeUs=1000003880000 size=1630
onQueueInputBuffer timeUs=1000003960000 size=1728
onQueueInputBuffer timeUs=1000004040000 size=3313
onQueueInputBuffer timeUs=1000004080000 size=951 KEY
onQueueInputBuffer timeUs=1000004160000 size=45
onQueueInputBuffer timeUs=1000004120000 size=43
onQueueInputBuffer timeUs=1000004200000 size=132
onQueueInputBuffer timeUs=1000004240000 size=241
Time stamps us 1000004000000 and 1000004040000 get repeated. This has an impact in applications where timestamp accuracy is required.
Root cause From our analysis the root cause seems to be the calculated asset duration. The duration of a Transport Stream asset gets calculated as difference between the very last PCR and the very first PCR encountered in the stream. However this seems to bring some issues:
- The calculation is likely frame inaccurate, because there is no guarantee that the first and last PCRs happen on the very first and the very last frame. As far as we can read, ISO/IEC 13818-1 does not seem to mandate PCRs being repeated for each Access Unit. For ex, it's easy to find TS assets where the last seen PCR is in the frame before the last one, ditto for the first one.
- The calculation misses the duration of the very first frame, otherwise for ex. for a two frames video the duration would end up being just one frame duration.
Proposed solution As a mitigation we propose here a change to use DTS, in pretty much the same way. Since the DTS is monotonic it allows to easily calculate both the frame duration (point 2) and the asset duration. Specifically: frame duration as difference between two consecutive DTSes (2nd and 1st) and asset duration as difference between last DTS and first DTS, and corrected by adding one frame duration.
In the unlikely event the DTS is not available, this PR would still fall back to using the PCR as in current main.
Hi @rohitjoins, is there any additional info I can provide to help you reviewing this PR?
Hi @dsparano, I haven't got time yet to look at this yet. Will review this soon and let you know. Thank you for the patience.
Hi - would it be possible to open a new PR from an individual-owned fork? We can't push changes to organization-owned forks like this one unless we have collaborator access. If that's not possible then we can still merge this PR but it will result in an 'evil' merge. See more info here: https://github.com/androidx/media/blob/release/CONTRIBUTING.md#push-access-to-pr-branches
@rohitjoins I've given you write access on our forked repo, please let me know if you cannot write.
Repeating again, please check the failing test cases in TsExtractorTest. Dump files shows a change in duration for the sample files, how do we explain the change and was it more accurate to derive the duration from PCR or DTS?
Re the TsDurationReaderTest.
I've checked the test TS bbb_2500ms.ts using the DVB Inspector analyser:
- First PCR: 0x1206420 (18900000) => 0:00:00.700000 appears at the very first frame
- Last PCR: 0x5265C00 (86400000) => 0:00:03.200000 appears at the penultimate frame difference is 2.5 secs.
However:
- Frame count: 58
- Fps: 24
- As a result the real duration is 24/58 secs = 2.416666 secs
To complicate things the DTS and PTS values are a bit strange:
- First encoded frame: pts: 0x21822 (137250) => 0:00:01.5250 dts: 0x1EC30 (126000) => 0:00:01.4000
- Second encoded frame: pts: 0x24414 (148500) => 0:00:01.6500 dts: 0x21822 (137250) => 0:00:01.5250
- Last encoded frame: pts: 0x57864 (358500) => 0:00:03.9833 dts: 0x569BE (354750) => 0:00:03.9416
(Note: the last encoded frame has the highest PTS so it it the last also in presentation order)
The PTS/DTS are on a 90KHz tick so given that the stream is 24fps this gives a 90000/24 = 3750 ticks/frame.
What happens with this encode is that the last-first DTS diff is 354750-126000 = 228750 ticks equal to 61 frames, plus the last (or first depending how you see it) this gives 62 frames, which seems wrong.
Also, this PR assumes equally spaced DTS (which seems a fair assumption) while the diff second-first DTS here is 137250-126000 = 11250 which is 3 frames duration.
Long story short I'm not entirely sure this is a fully conformant stream, or one meaningful to be used for testing.
On the other hand if we take the test file sample_h264.ts:
-
First PCR: 0x1206420 (18900000) => 0:00:00.700000 appears at the very first frame
-
Last PCR: 0x2932E00 (43200000) => 0:00:01.600000 appears at two frames behind the last one (i.e. index 27) difference is 0.9 secs
-
Frame count: 30
-
Fps: 30
-
As a result the real duration is 30/30 secs = 1 sec
Timestamps:
- First encoded frame: pts: 0x203A0 (132000) => 0:00:01.4666 dts: 0x1EC30 (126000) => 0:00:01.4000
- Second encoded frame: pts: 0x20F58 (135000) => 0:00:01.5000 dts: 0x1F7E8 (129000) => 0:00:01.4333
- Last encoded frame: pts: 0x35778 (219000) => 0:00:02.4333 dts: 0x34008 (213000) => 0:00:02.3666
Therefore: DTS_1 - DTS_0 = 129000 - 126000 = 3000 DTS_29 - DTS_0 = 213000 - 126000 = 87000
Duration based on DTS: 87000 + 3000 = 90000 which is 1 sec, which is correct.
Not sure how the original bbb_2500ms.ts was encoded, anyway I just did:
ffmpeg -i https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_1080p_stereo.avi -t 00:00:02.500 bbb_2500ms.ts
and the resulting asset is totally equivalent with the current one, but with correct frame count and timestamps, so I've added it to this PR.
I've also added .gitattributes files where needed to avoid false negatives on tests running in Windows, due to the different line endings.
I'm currently looking at the tests, many dumps need updating because of the different durations we now get. There are a few failings I'm currently analysing.
Will this fix reduce the chance of "Unexpected audio track timestamp discontinuity" being thrown? test link need to set mineType to hls about watch 30 sec will see error
@FongMi I'm struggling to find a device supporting audio mpeg-L2, any suggestion?
@FongMi I'm struggling to find a device supporting audio mpeg-L2, any suggestion?
Sorry, I don't have suggestion, I use Xiaomi 13...
@FongMi could play your content on a Samsung S21, the Unexpected audio track timestamp discontinuity is still there :( it seems more an encoder issue to me as there are jumps in audio presentation timestamps.
@rohitjoins I've done a few updates and refreshed many dump files. Tests are all passing now, if you could please check the changes.
Hi @dsparano,
Thank you for addressing the comments. Please be patient as this is a big change. Will get this reviewed internally if I don't find anything to flag before that.
Note: I'm going on vacation for 3 weeks soon, so expect a delay.
Hi @rohitjoins have you had the chance to make progress with this review?
Hi @rohitjoins I've rebased and updated a few dump files, can you please resume your review?
Hi @dsparano,
I'll take a look again on this PR and will let you know if there are any further concerns to merge this.