PyAV Is it possible to iterate a video stream over presentation (not decoding) time?

Overview

First, my understanding is that when I iterate over a video in the way that the documentation suggests for frame in container.decode(video=0) the elements are returned in decoding order (which makes a lot of sense). Is this correct? I am basing my assumption over a specific example and I show this in the investigation below.

My question is, it possible to iterate the frames of a video in the order of pts (presentation timestamps) instead of dts (decoding timestamps)? And if so, how would I do this?

Any pointers about a specific solution that does not involve loading and re-ordering the whole video will be greatly appreciated.

Expected behavior

Does not apply

Actual behavior

Does not apply

Investigation

If I used the following code:

import av

container = av.open(str(video_file_path))

for index, frame in enumerate(container.decode(video=0)):
    print(f"dts = {frame.dts}, pts={frame.pts}, frame={frame.index}, time={frame.time:2.2f}")

I get the following output:

dts = 20, pts=19, frame=0, time=0.32
dts = 21, pts=22, frame=1, time=0.37
dts = 22, pts=21, frame=2, time=0.35
dts = 23, pts=23, frame=3, time=0.38
dts = 24, pts=20, frame=4, time=0.33
dts = 25, pts=26, frame=5, time=0.43
dts = 26, pts=25, frame=6, time=0.42

Research

I have done the following:

[x] Checked the PyAV documentation
[x] Searched on Google
[x] Searched on Stack Overflow
[x] Looked through old GitHub issues
[x] Asked on PyAV Gitter
[x] ... and waited 72 hours for a response.

Additional context

We are working in an application for research purposes (academia). Here our concerns is that we need the precise timestamps of each frame. Therefore, it is very important for us to understand properly how to synchronize timestamps with the correct frame. A possible solution is to extract all the frames and then re-order them by presentation time but as the the videos we deal with are rather large, this is unfeasible due to memory concerns.

Aug 30 '22 11:08 h-mayorquin

Hi, this could be due to the codec settings encoding the video file.

Please check if the REORDER flag is set in the video file with the following code:

import av

container = av.open(str(video_file_path))
video = container.streams.video[0]

# Can you confirm REORDER?
print(video.codec_context.codec.properties)
# Can you confirm True?
print(video.codec_context.codec.reorder)

If you do not see these flags, it is possible that frame reordering was not set during encoding in the codec settings of the loaded video file, or that the codec does not support frame reordering.

A simple solution is to re-encode the video file using a codec that supports frame reordering. I believe popular codecs such as H.264 support frame reordering.

Sep 03 '22 08:09 hudrazine

Thanks a bunch for taking the time to answer. So for the the output is the following:

print(stream.codec_context.codec.properties.REORDER)
print(stream.codec_context.codec.reorder)
print(stream.codec_context.name)

REORDER
True
h264

But this is strange because the context code is already h264.

For context, is it my correct reading that if reorder is True then the iteration should return the frames ordered by ascending presentation time? (as the codec knows the information on how to do this).

Sep 03 '22 14:09 h-mayorquin

oops, I have a correction to my earlier reply.

The video.codec_context.codec.properties was a flag indicating the properties of the codec itself. That is, it did not indicate the current state of the video. The same goes for reorder since it is an alias. Sorry for the mistake.

But, as mentioned at the beginning, we believe that the frame reordering setting during encoding of the video file may be affecting the order of the iterations during decoding.

If you don't want to re-encode or otherwise modify the video file, I think you can use PriorityQueue and threads in the standard library queue module to get frames in a pts-based order without memory overflow.

Sep 03 '22 16:09 hudrazine

Hi, thanks for the clarification, so I can't really rely on those because they are flags about the encoder not about the state of video. I guess there is no way of knowing how the video is ordered other than decoding and seeing the timestamps.

Concerning your suggestion of re-encoding the video. Can this be quicker than iterating over the video (I think not)? in the actual applications we will be receiving new videos from different labs and we just care about aligning timestamps to frames so maybe this is not efficient? It seems that we will need to implement something along the lines of the second proposal (using a PriorityQueue).

Sep 03 '22 21:09 h-mayorquin

However, I am still confused about the output and wondering about how to test for reliability of the results. I doubled checked the same file with ffprobe and I get the following:

ffprobe -v error 161101_1705.avi -hide_banner -print_format flat -select_streams v:0 
-show_entries frame=coded_picture_number,pict_type,pkt_dts,pkt_dts_time,pkt_pts,pkt_pts_time,best_effort_timestamp,best_effort_timestamp_time 
-read_intervals "%+#7"
frames.frame.0.pkt_pts="N/A"
frames.frame.0.pkt_pts_time="N/A"
frames.frame.0.pkt_dts=20
frames.frame.0.pkt_dts_time="0.333333"
frames.frame.0.best_effort_timestamp=20
frames.frame.0.best_effort_timestamp_time="0.333333"
frames.frame.0.pict_type="I"
frames.frame.0.coded_picture_number=0
frames.frame.1.pkt_pts="N/A"
frames.frame.1.pkt_pts_time="N/A"
frames.frame.1.pkt_dts=21
frames.frame.1.pkt_dts_time="0.350000"
frames.frame.1.best_effort_timestamp=21
frames.frame.1.best_effort_timestamp_time="0.350000"
frames.frame.1.pict_type="B"
frames.frame.1.coded_picture_number=3
frames.frame.2.pkt_pts="N/A"
frames.frame.2.pkt_pts_time="N/A"
frames.frame.2.pkt_dts=22
frames.frame.2.pkt_dts_time="0.366667"
frames.frame.2.best_effort_timestamp=22
frames.frame.2.best_effort_timestamp_time="0.366667"
frames.frame.2.pict_type="B"
frames.frame.2.coded_picture_number=2
frames.frame.3.pkt_pts="N/A"
frames.frame.3.pkt_pts_time="N/A"
frames.frame.3.pkt_dts=23
frames.frame.3.pkt_dts_time="0.383333"
frames.frame.3.best_effort_timestamp=23
frames.frame.3.best_effort_timestamp_time="0.383333"
frames.frame.3.pict_type="B"
frames.frame.3.coded_picture_number=4
frames.frame.4.pkt_pts="N/A"
frames.frame.4.pkt_pts_time="N/A"
frames.frame.4.pkt_dts=24
frames.frame.4.pkt_dts_time="0.400000"
frames.frame.4.best_effort_timestamp=24
frames.frame.4.best_effort_timestamp_time="0.400000"
frames.frame.4.pict_type="P"
frames.frame.4.coded_picture_number=1
frames.frame.5.pkt_pts="N/A"
frames.frame.5.pkt_pts_time="N/A"
frames.frame.5.pkt_dts="N/A"
frames.frame.5.pkt_dts_time="N/A"
frames.frame.5.best_effort_timestamp="N/A"
frames.frame.5.best_effort_timestamp_time="N/A"
frames.frame.5.pict_type="B"
frames.frame.5.coded_picture_number=6
frames.frame.6.pkt_pts="N/A"
frames.frame.6.pkt_pts_time="N/A"
frames.frame.6.pkt_dts="N/A"
frames.frame.6.pkt_dts_time="N/A"
frames.frame.6.best_effort_timestamp="N/A"
frames.frame.6.best_effort_timestamp_time="N/A"
frames.frame.6.pict_type="P"
frames.frame.6.coded_picture_number=5

So according to ffprobe the first frames don't even have presentation timestamps and the coded_picture_frames don't coincide with decoding orders. Is this information supposed to match? Do they come from a different part of the container?

EDIT: included pict_type.

Sep 03 '22 21:09 h-mayorquin

hmmm, I can't say as I've never dealt with a video file that exhibits this behavior... I find it strange that there are no pts...

coded_picture_number refers to the order in which frames are decoded, which is counted based on I frames P frames B frames.

Look at the pict_type:

ffprobe -v error video.mp4 -hide_banner -print_format flat -select_streams v:0
-show_entries frame=coded_picture_number,pkt_dts,pkt_dts_time,pkt_pts,pkt_pts_time,best_effort_timestamp,best_effort_timestamp_time,pict_type -read_intervals "%+#16"

frames.frame.0.pkt_pts=0
frames.frame.0.pkt_pts_time="0.000000"
frames.frame.0.pkt_dts=0
frames.frame.0.pkt_dts_time="0.000000"
frames.frame.0.best_effort_timestamp=0
frames.frame.0.best_effort_timestamp_time="0.000000"
frames.frame.0.pict_type="I"
frames.frame.0.coded_picture_number=0
frames.frame.1.pkt_pts=1001
frames.frame.1.pkt_pts_time="0.033367"
frames.frame.1.pkt_dts=1001
frames.frame.1.pkt_dts_time="0.033367"
frames.frame.1.best_effort_timestamp=1001
frames.frame.1.best_effort_timestamp_time="0.033367"
frames.frame.1.pict_type="B"
frames.frame.1.coded_picture_number=3
frames.frame.2.pkt_pts=2002
frames.frame.2.pkt_pts_time="0.066733"
frames.frame.2.pkt_dts=2002
frames.frame.2.pkt_dts_time="0.066733"
frames.frame.2.best_effort_timestamp=2002
frames.frame.2.best_effort_timestamp_time="0.066733"
frames.frame.2.pict_type="B"
frames.frame.2.coded_picture_number=2
frames.frame.3.pkt_pts=3003
frames.frame.3.pkt_pts_time="0.100100"
frames.frame.3.pkt_dts=3003
frames.frame.3.pkt_dts_time="0.100100"
frames.frame.3.best_effort_timestamp=3003
frames.frame.3.best_effort_timestamp_time="0.100100"
frames.frame.3.pict_type="B"
frames.frame.3.coded_picture_number=4
frames.frame.4.pkt_pts=4004
frames.frame.4.pkt_pts_time="0.133467"
frames.frame.4.pkt_dts=4004
frames.frame.4.pkt_dts_time="0.133467"
frames.frame.4.best_effort_timestamp=4004
frames.frame.4.best_effort_timestamp_time="0.133467"
frames.frame.4.pict_type="P"
frames.frame.4.coded_picture_number=1
frames.frame.5.pkt_pts=5005
frames.frame.5.pkt_pts_time="0.166833"
frames.frame.5.pkt_dts=5005
frames.frame.5.pkt_dts_time="0.166833"
frames.frame.5.best_effort_timestamp=5005
frames.frame.5.best_effort_timestamp_time="0.166833"
frames.frame.5.pict_type="B"
frames.frame.5.coded_picture_number=7
frames.frame.6.pkt_pts=6006
frames.frame.6.pkt_pts_time="0.200200"
frames.frame.6.pkt_dts=6006
frames.frame.6.pkt_dts_time="0.200200"
frames.frame.6.best_effort_timestamp=6006
frames.frame.6.best_effort_timestamp_time="0.200200"
frames.frame.6.pict_type="B"
frames.frame.6.coded_picture_number=6
frames.frame.7.pkt_pts=7007
frames.frame.7.pkt_pts_time="0.233567"
frames.frame.7.pkt_dts=7007
frames.frame.7.pkt_dts_time="0.233567"
frames.frame.7.best_effort_timestamp=7007
frames.frame.7.best_effort_timestamp_time="0.233567"
frames.frame.7.pict_type="B"
frames.frame.7.coded_picture_number=8
frames.frame.8.pkt_pts=8008
frames.frame.8.pkt_pts_time="0.266933"
frames.frame.8.pkt_dts=8008
frames.frame.8.pkt_dts_time="0.266933"
frames.frame.8.best_effort_timestamp=8008
frames.frame.8.best_effort_timestamp_time="0.266933"
frames.frame.8.pict_type="P"
frames.frame.8.coded_picture_number=5
frames.frame.9.pkt_pts=9009
frames.frame.9.pkt_pts_time="0.300300"
frames.frame.9.pkt_dts=9009
frames.frame.9.pkt_dts_time="0.300300"
frames.frame.9.best_effort_timestamp=9009
frames.frame.9.best_effort_timestamp_time="0.300300"
frames.frame.9.pict_type="B"
frames.frame.9.coded_picture_number=11
frames.frame.10.pkt_pts=10010
frames.frame.10.pkt_pts_time="0.333667"
frames.frame.10.pkt_dts=10010
frames.frame.10.pkt_dts_time="0.333667"
frames.frame.10.best_effort_timestamp=10010
frames.frame.10.best_effort_timestamp_time="0.333667"
frames.frame.10.pict_type="B"
frames.frame.10.coded_picture_number=10
frames.frame.11.pkt_pts=11011
frames.frame.11.pkt_pts_time="0.367033"
frames.frame.11.pkt_dts=11011
frames.frame.11.pkt_dts_time="0.367033"
frames.frame.11.best_effort_timestamp=11011
frames.frame.11.best_effort_timestamp_time="0.367033"
frames.frame.11.pict_type="B"
frames.frame.11.coded_picture_number=12
frames.frame.12.pkt_pts=12012
frames.frame.12.pkt_pts_time="0.400400"
frames.frame.12.pkt_dts=12012
frames.frame.12.pkt_dts_time="0.400400"
frames.frame.12.best_effort_timestamp=12012
frames.frame.12.best_effort_timestamp_time="0.400400"
frames.frame.12.pict_type="P"
frames.frame.12.coded_picture_number=9
frames.frame.13.pkt_pts=13013
frames.frame.13.pkt_pts_time="0.433767"
frames.frame.13.pkt_dts=13013
frames.frame.13.pkt_dts_time="0.433767"
frames.frame.13.best_effort_timestamp=13013
frames.frame.13.best_effort_timestamp_time="0.433767"
frames.frame.13.pict_type="B"
frames.frame.13.coded_picture_number=15
frames.frame.14.pkt_pts=14014
frames.frame.14.pkt_pts_time="0.467133"
frames.frame.14.pkt_dts="N/A"
frames.frame.14.pkt_dts_time="N/A"
frames.frame.14.best_effort_timestamp=14014
frames.frame.14.best_effort_timestamp_time="0.467133"
frames.frame.14.pict_type="B"
frames.frame.14.coded_picture_number=14
frames.frame.15.pkt_pts=16016
frames.frame.15.pkt_pts_time="0.533867"
frames.frame.15.pkt_dts="N/A"
frames.frame.15.pkt_dts_time="N/A"
frames.frame.15.best_effort_timestamp=16016
frames.frame.15.best_effort_timestamp_time="0.533867"
frames.frame.15.pict_type="P"
frames.frame.15.coded_picture_number=13

Sep 04 '22 09:09 hudrazine

Well, I think pts was required and dts was optional, but it seems to be falling apart in the submitted output. You might try regenerating the timestamps to see if that improves things.

ffmpeg -fflags +genpts -i video.avi -c copy video.mp4

or

ffmpeg -fflags +genpts+igndts -i video.avi -c copy video.mp4

Sep 04 '22 10:09 hudrazine

Well, I think pts was required and dts was optional, but it seems to be falling apart in the submitted output. You might try regenerating the timestamps to see if that improves things.
ffmpeg -fflags +genpts -i video.avi -c copy video.mp4

This is the output of the same code above applied to this code:

dts = 5110, pts=4854, frame=0, time=0.316 dts = 5366, pts=5622, frame=1, time=0.366 dts = 5622, pts=5366, frame=2, time=0.349 dts = 5878, pts=5878, frame=3, time=0.383 dts = 6134, pts=5110, frame=4, time=0.333 dts = 6390, pts=6646, frame=5, time=0.433 dts = 6646, pts=6390, frame=6, time=0.416

or ffmpeg -fflags +genpts+igndts -i video.avi -c copy video.mp4

dts = 5110, pts=4854, frame=0, time=0.316 dts = 5366, pts=5622, frame=1, time=0.366 dts = 5622, pts=5366, frame=2, time=0.349 dts = 5878, pts=5878, frame=3, time=0.383 dts = 6134, pts=5110, frame=4, time=0.333 dts = 6390, pts=6646, frame=5, time=0.433 dts = 6646, pts=6390, frame=6, time=0.416

Both of them have the time increasing non-monotonically which is strange.

Sep 05 '22 14:09 h-mayorquin

hmmm, I can't say as I've never dealt with a video file that exhibits this behavior... I find it strange that there are no pts...

coded_picture_number refers to the order in which frames are decoded, which is counted based on I frames P frames B frames.

Look at the pict_type:

I edited the answer above to include this information (where I show the output of ffprobe). The answer seems strange, picture type and code match what I expected but I expected that the order of the decoding timestamps should coincide with coded_picture_number (it does not):

frames.frame.0.pkt_dts=20
frames.frame.0.pict_type="I"
frames.frame.0.coded_picture_number=0
frames.frame.1.pkt_dts=21
frames.frame.1.pict_type="B"
frames.frame.1.coded_picture_number=3
frames.frame.2.pkt_dts=22
frames.frame.2.pict_type="B"
frames.frame.2.coded_picture_number=2
frames.frame.3.pkt_dts=23
frames.frame.3.pict_type="B"
frames.frame.3.coded_picture_number=4
frames.frame.4.pkt_dts=24
frames.frame.4.pict_type="P"
frames.frame.4.coded_picture_number=1
frames.frame.5.pkt_dts="N/A"
frames.frame.5.pict_type="B"
frames.frame.5.coded_picture_number=6
frames.frame.6.pkt_dts="N/A"
frames.frame.6.pict_type="P"
frames.frame.6.coded_picture_number=5

However, maybe this is not strange, the output of the video that you presented is mismatched in the same way (dts and pts are identical even for picture types that are not I).

As an aside, what do you think of relying instead of the ffprobe field called "best_effort_timestamp_time". I guess that when the pts are not available AVReader will make a guess based on previous pts and dts.

Sep 05 '22 14:09 h-mayorquin

Both of them have the time increasing non-monotonically which is strange.

This may be due to the fact that I did not specify fps with the -r option.

I edited the answer above to include this information (where I show the output of ffprobe). The answer seems strange, picture type and code match what I expected but I expected that the order of the decoding timestamps should coincide with coded_picture_number (it does not)

A little research on coded_picture_number revealed that this is a number determined during the encoding process (presumably, the order in which the encoding was done). This means that it is not exactly in sync with dts.

I also found an interesting description in the FFmpeg library documentation:

Some formats misuse the terms dts and pts/cts to mean something different. Such timestamps must be converted to true pts/dts before they are stored in AVPacket. FFmpeg: AVPacket Struct Reference

If pts or dts are obviously wrong, this may be related to this.

By the way, the back and forth order of B-frames of the same type is probably related to the mode of the B-frames, such as reference B-frames and normal B-frames.

As an aside, what do you think of relying instead of the ffprobe field called "best_effort_timestamp_time".

After all, we expect continuous frames that comply with the playback time, so if the decoding outputs frames without reordering, it is good to find a reliable value in the test and rely on that.

Sep 06 '22 11:09 hudrazine

I also found an interesting description in the FFmpeg library documentation:

This is very useful. Thanks a lot for all your help, I have learned a lot thinking about this issue.

There is one remaining thing that really baffles me here. From where does PyAv is getting the pts in my example? If you go back to the top of this thread PyAV provides pts of 19, 22, 21, where does it get them?

As I have shown in this thread ffprobe shows no pts for this file and then I tried running this directly from C and I got the same result

So why is PyAv showing pts values? I looked at the source in context.pyx and the decode method just seems to be using the a combination of lib.avcodec_send_packet and then avcodec_receive_frame to fill the contents of the frame object in frame.pyx which just uses the pointer to the AVFrame property pts.

So where are these values of 19, 22, 21 for the pts come from? If you have any hunch or ideas that would be really useful.

Here is the output from the C program:

OG: AVStream->time_base before open coded 1/60
LOG: AVStream->r_frame_rate before open coded 60/1
LOG: AVStream->start_time 0
LOG: AVStream->duration 108026
LOG: finding the proper decoder (CODEC)
LOG: Video Codec: resolution 1280 x 960
LOG:    Codec h264 ID 27 bit_rate 1227165
LOG: AVPacket->pts -9223372036854775808
LOG: AVPacket->pts -9223372036854775808
LOG: AVPacket->pts -9223372036854775808
LOG: Frame 1 (type=I, size=267689 bytes, format=0) pts 0 key_frame 1 [DTS 0] [DPN 0] pkt_dts 20 - best_effort_ts 20
LOG: AVPacket->pts -9223372036854775808
LOG: Frame 2 (type=B, size=214 bytes, format=0) pts 0 key_frame 0 [DTS 3] [DPN 0] pkt_dts 21 - best_effort_ts 21
LOG: AVPacket->pts -9223372036854775808
LOG: Frame 3 (type=B, size=332 bytes, format=0) pts 0 key_frame 0 [DTS 2] [DPN 0] pkt_dts 22 - best_effort_ts 22
LOG: AVPacket->pts -9223372036854775808
LOG: Frame 4 (type=B, size=212 bytes, format=0) pts 0 key_frame 0 [DTS 4] [DPN 0] pkt_dts 23 - best_effort_ts 23
LOG: AVPacket->pts -9223372036854775808
LOG: Frame 5 (type=P, size=897 bytes, format=0) pts 0 key_frame 0 [DTS 1] [DPN 0] pkt_dts 24 - best_effort_ts 24
LOG: AVPacket->pts -9223372036854775808
LOG: Frame 6 (type=B, size=224 bytes, format=0) pts 0 key_frame 0 [DTS 7] [DPN 0] pkt_dts 25 - best_effort_ts 25
LOG: releasing all the resources

This might be related to this issue https://trac.ffmpeg.org/ticket/2375 as I am getting that the 19 is the pts of the package that triggers the decoding indeed. So I am not sure when and where that assignation is happening.

Sep 06 '22 14:09 h-mayorquin

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jan 05 '23 02:01 github-actions[bot]