ab-av1
ab-av1 copied to clipboard
frame rate change using the fps filter messes up VMAF calculation
I have some video files where every other frame is a duplicate for some reason. So when transcoding, I use FFmpeg's fps
filter to cut the frame rate in half to fix the issue. However, ab-av1 doesn't take the change in frame rate fully into consideration during VMAF calculation. While the fps
filter defined with --vfilter
is properly applied to the reference file, the issue is that ab-av1 still sets the input frame rate of both files to 24 with -r 24
. In other words, a 48 fps file will still be read as a 24 fps file, leading to a desyc.
How to reproduce:
ab-av1 crf-search -e libx264 --vmaf n_threads=4:n_subsample=3 --preset veryfast --min-vmaf 93 --vfilter fps=24 --cache false --input sintel.fpstest.mp4
Input file (48 fps with every other frame being a duplicate): sintel.fpstest.mp4
With an encoded sample file sintel.fpstest.x264.crf19.veryfast.mp4, the way to calculate VMAF is:
ffmpeg -hide_banner -r 24 -i sintel.fpstest.x264.crf19.veryfast.mp4 -r 48 -i sintel.fpstest.mp4 -an -sn -map 0:V -map 1:V -lavfi "[0:v]setpts=PTS-STARTPTS[dist];[1:v]fps=24,setpts=PTS-STARTPTS[ref];[dist][ref]libvmaf=n_threads=4" -f null -
The way to solve this is setting the value of -r
to the native frame rate of each input file.
I'm not sure if there are other filters or corner cases (such as variable frame rate) that are also affected by this.
Setting -r 24
for vmaf came from #115 & discussion #108.
My initial thoughts: We'll need to test a bunch of the cases mentioned previously if we want to change this generally.
But I think having different -r values won't generally make sense as vmaf shouldn't be expected to work properly on different framerate inputs.
However, perhaps we could put a workaround in if the "fps" vfilter is used.
I believe that a very safe way to do it would be to create a framerate arg with the following behaviour:
- If the arg is manually set to
-r auto
or-r 0
--> use the automatically detected native framerate. - If the arg is manually set to a fixed framerate (also represented as fractions like
-r 24000/1001
) --> use the user defined input. This would help as a failsafe method by letting the user override the framerate for certain files that may need it. - If the arg is not manually set --> it fallbacks to
-r 24
as it is now by default.
Doing it this way would allow to change the default behaviour of the arg from -r 24
to -r auto
in the future if needed, after everything is well tested.
By using an argument, regardless of the default value, all users have alternative methods just in case a very specific file messes up vmaf, so it covers all the use cases.
What do you think?
I have a question to make ab-av1 work, do I need ffmpeg and the vmaf application also the one that is in the netflix repository on github? or that one is already brought by the ab-av1 application
Setting -r 24 for vmaf came from https://github.com/alexheretic/ab-av1/pull/115 & discussion https://github.com/alexheretic/ab-av1/issues/108.
The exact values used for -r
don't make a difference for the actual VMAF calculation, as long as they're correct relative to each other. The purpose is to sync the files by reading them at the correct rate. So if both files have the same frame rate, you can set -r
to whatever value you want, and the VMAF score will be the same for any value you use. And in my use above, I can set -r
for the encoded file to 50 and the reference to 100, and the VMAF score will be the same as with the command above that uses 24 and 48.
The VMAF models provided with libvmaf were trained on 24 fps footage, which means the models may or may not be accurate in estimating quality for other frame rates (see https://github.com/Netflix/vmaf/issues/446#issuecomment-703005622). Though according to this study, this isn't an issue and VMAF does work well with high frame rates. This isn't related to -r
, however.
I've been meaning to write a VMAF page on the FFmpeg wiki to collect all the information and caveats of VMAF I've run into over the years. Such as some encoders dropping/duplicating frames that are identical, which will lead to a desync during VMAF calculation due to the reference and encoded file having a different number of frames (this can be prevented with the use of -fps_mode passthrough
during encoding, or -vsync 0
with the older/deprecated syntax).
I have a question to make ab-av1 work, do I need ffmpeg and the vmaf application also the one that is in the netflix repository on github? or that one is already brought by the ab-av1 application
@manbug10 To use ab-av1 you just need it + ffmpeg. ffmpeg needs to be a built with svtav1, vmaf enabled. See https://github.com/alexheretic/ab-av1#requirements
Working with ffmpeg sure does feel like magic sometimes (which is a bad thing).
I wonder why setting -r 24
helps at all then. We previously omitted it. I can't remember exactly but presumably having it fixed some case, else I would be unlikely to include it. Ideally I'd like to figure out/remember which case did specifying -r 24 fix & why.
Perhaps the idea to just omit -r when a fps=
vfilter is used could still work?
Setting an -r
value is recommended by VMAF developers in their documentation and it actually does help and fixes very odd vmaf scores for some files when -r
is omitted (like in previous ab-av1 builds).
They used -r 24
in their example in the docs, but it does not mean other values should not work, as suggested by @veikk0. Nevertheless, some -r
value is definitely needed to set a relative framerate between ref and main files, otherwise vmaf scores break sometimes.
I collected a sample test suite and did a whole bunch of tests just to get to that conclusion. I also sent you a couple of sample files and you also found a sample (SonyNY4K) that used to give inaccurate vmaf scores omitting -r
and then was fixed when using -r 24
(although that sample was 60fps).
In my many tests, I tried several different -r
values for different fps samples and they all worked fine as long as they matched for both ref and main files, because my encondings and originals had equal framerates.
As that is the most common situation for most files, having a fixed -r 24
value does not seem an issue, but it could be in some edge cases like the one explained by @veikk0 .
Perhaps the idea to just omit -r when a
fps=
vfilter is used could still work?
I have no idea. Perhaps it would still work for some cases and fail in others, the same way as omitting -r
doesn't always give inaccurate results, but it does unpredictable fail sometimes. I would not ever omit -r
. I believe the proper way to fix it is to set -r
to both input file's native framerate or let the user override.
I also reported this issue to FFMetrics developer (https://github.com/fifonik/FFMetrics/issues/108) and he also applied the fix by adding -r
to the ffmpeg line, although he decided to set it automatically according to the native framerate of the file, and added an option to manually change it to a fixed -r
value by the user if needed. Some users actually opened a couple of issues because the automatically detected framerate was causing vmaf score problems (ref and main detected framerates didn't match), so being able to manually override it helped them with those specific files. In my tests, automatic mode always worked fine, no problems whatsoever after adding -r
.
That is why I suggested creating an arg as a possible solution, it would fit any situation.
Nevertheless, some -r value is definitely needed to set a relative framerate between ref and main files, otherwise vmaf scores break sometimes.
I can confirm this, it also helps when the source has variable framerate. Using always constant framerate is a very good advice in general. Otherwise it's more likely that the playing time gets stretched or squeezed in a wrong manner at some point (unnoticed). Had this problems more than once when muxing different sources and when the playing time is wrong, everything is garbled up.
To solve some question marks:
ffmpeg -r 24 -i source.mp4
does not drop frames, instead it stretches the playing time according to the 'more' frames. For example a 60seconds48fpssource.mp4 will be delivered to the encoder as 120sec stretched.
ffmpeg -i source.mp4 -r 24
drops frames accordingly and keeps the correct playtime. This should be used if your source has variable fps or if you need to drop fps. For adding fps, the filter (and only that!) would be the better approach. For calculating VMAF it should not matter, only for the encode.
The solution would be, when the fps filter is used, use the same value with -r, otherwise stay with 24. This should work 100% if dropping is needed, but I'm not so sure about adding frames.
So correct ffmpeg part would be if --vfilter "fps=44"
ffmpeg -i source.mp4 -r 44 -vf "fps=44" output.mp4
I have an exactly fitting case here.
https://www.ardmediathek.de/video/babylon-berlin/folge-1-oder-staffel-1-s01-e01/das-erste/Y3JpZDovL2Rhc2Vyc3RlLmRlL2JhYnlsb24tYmVybGluLzYwMzk5MTBlLTBhZDAtNDg1Ni04OWRkLWQxODdjZmZiNmMxYw
They have a region block, non-German IPs from abroad are rejected by their server. Maybe you are lucky and if not here is a 2-minute sample: ffmpeg -i bbs1e1.mp4 -ss 03:24 -to 05:24 -c:v copy -an -sn -dn sample.mp4 https://ufile.io/p0a6wvyx
ffmpeg shows
Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 3692 kb/s, 50 fps, 50 tbr, 90k tbn (default)
mediainfo:
Frame rate mode : Constant
Frame rate : 50.000 FPS
and those 50fps are wrong. Correct would be 25fps (as most german TV-material or DVDs) and I checked the whole playtime against an ntsc-film (23.976024fps) source. The whole file has a playtime of 44:59 mins, let's round that to 45mins. 25000/23976 gives factor 1.04270 and 1.04270*45mins gives 4692150, which fits the ntsc-film source with ~47mins.
This vmaf calculation is correct:
ab-av1 crf-search -i sample.mp4
- crf 32 VMAF 91.64 (34%)
- crf 21 VMAF 94.11 (76%)
- crf 10 VMAF 95.98 (225%)
- crf 16 VMAF 94.91 (111%)
00:02:47 ##################################################################################################################################### (sampling crf 16, eta 0s)
Error: Failed to find a suitable crf
This gives us at least crf21 as orientation to work with ffmpeg directly.
Wrong:
ab-av1 crf-search --vfilter "fps=fps=25" -i sample.mp4
- crf 32 VMAF 12.33 (25%)
- crf 21 VMAF 12.36 (55%)
- crf 10 VMAF 12.40 (166%)
00:01:31 ##################################################################################################################################### (sampling crf 10, eta 0s)
Error: Failed to find a suitable crf
But
ffmpeg -i sample.mp4 -c:v libsvtav1 -pix_fmt yuv420p10le -preset 8 -crf 21 -vf "fps=fps=25" fps25.mp4
keeps correct playtime@25fps and still smooth playback, no bad 'distortion' besides the av1-compression
Wrong:
ab-av1 crf-search --vfilter "minterpolate=fps=25:mi_mode=dup" -i sample.mp4
- crf 32 VMAF 10.79 (25%)
- crf 21 VMAF 10.75 (55%)
- crf 10 VMAF 10.75 (166%)
00:01:30 ##################################################################################################################################### (sampling crf 10, eta 0s)
Error: Failed to find a suitable crf
But
ffmpeg -i sample.mp4 -c:v libsvtav1 -pix_fmt yuv420p10le -preset 8 -crf 21 -vf "minterpolate=fps=25:mi_mode=dup" minterpolatedup25.mp4
keeps correct playtime@25fps, not that smooth as it could be, because dup is wrong setting if you need to drop fps, no bad 'distortion' besides the av1-compression
Wrong:
ab-av1 crf-search --vfilter "minterpolate=fps=25:mi_mode=blend" -i sample.mp4
- crf 32 VMAF 12.55 (25%)
- crf 21 VMAF 12.51 (55%)
- crf 10 VMAF 12.50 (166%)
00:01:30 ##################################################################################################################################### (sampling crf 10, eta 0s)
Error: Failed to find a suitable crf
But
ffmpeg -i sample.mp4 -c:v libsvtav1 -pix_fmt yuv420p10le -preset 8 -crf 21 -vf "minterpolate=fps=25:mi_mode=blend" minterpolateblend25.mp4
keeps correct playtime@25fps, blend works perfect if you need to drop exactly half the (full)fps or exactly double them and if you don't want to throw much cpu power for complete motion compensated interpolation onto it. no bad 'distortion' besides the av1-compression
Now let's force it raw:
ffmpeg -i sample.mp4 -r 25 -c:v libsvtav1 -pix_fmt yuv420p10le -preset 8 -crf 21 rawforced25.mp4
...
frame= 3002 fps= 54 q=21.0 Lsize= 39518kB time=00:02:00.04 bitrate=2696.8kbits/s dup=0 drop=2998 speed=2.16x
That fits, 3002+2998=6000 frames, which mediainfo showed also. Also keeps correct playtime@25fps, smooth, no bad 'distortion' besides the av1-compression
The filtered files have the same size, the raw forced wins because it's smaller:
-rw-r--r-- 1 mr44er wheel 40540660 Oct 7 16:37 fps25.mp4
-rw-r--r-- 1 mr44er wheel 40540660 Oct 7 16:52 minterpolateblend25.mp4
-rw-r--r-- 1 mr44er wheel 40540660 Oct 7 16:44 minterpolatedup25.mp4
-rw-r--r-- 1 mr44er wheel 40466182 Oct 7 17:01 rawforced25.mp4
So I think it would need (I would wish it ;) ) a new switch to set the raw framerate. --enc is wrong for that --enc-input 25 is also wrong, because that sets 'ffmpeg -r 25 -i sourcefile' which would stretch the 2mins to 4mins and thus garbles up VMAF as we've seen.
My suggestion: --fps-force 25 -> 'ffmpeg -i sourcefile -r 25'
Details: https://trac.ffmpeg.org/wiki/ChangingFrameRate
I forgot something: Encoding the whole file with -r 25 and -crf 21 results in half the predicted size, because half the unnecessary frames are kicked. So we have instead of - crf 21 VMAF 94.11 (76%) -> VMAF94,11@38% or VMAF 'over 100' at 76% file size :p
Has anybody ever asked why, or if, the VMAF devs "recommend" -r 24
? I think there is a misunderstanding here. Their first example used raw yuv files, which make the -r 24
necessary, since they don't carry timestamps. And timestamps are the key issue here, I think. The timestamps need to be synchronized, hence the setpts=PTS-STARTPTS
filter (some files have different STARTPTS), so the libvmaf filter can pick the correct frames from both inputs based on their PTS (presentation timestamp). So there really should be no need for -r 24
as an input option, unless the source has no timestamps. Also, the ffmpeg manual clearly states that -r
will ignore timestamps (PTS) and display frames for 1/r duration, which clearly is wrong, because the higher fps video needs half the frame duration for synchronicity. In fact, I believe -r 24
must not be in the ffmpeg command.
BTW, I believe that the second example on the VMAF site only has -r 24
in there because the author just edited the first one. And it is basically unnecessary, since the files are mp4. The only thing that is necessary is the (Edit: after some more testing it turns out, that, apparently setpts
filter, which should be the first in the chain IMHO, since other filters might mess with the PTS values.setpts
should be last) Has anybody ever asked, if that is an actual recommendation? Because I strongly believe it is not, and nowhere could I find an explicit mention of the need for -r
Long story short, I tested my theory with the OP's sample an lo and behold:
ffmpeg -hide_banner -nostats -i sintel.fpstest.x264.crf19.veryfast.mp4 -i sintel.fpstest.mp4 -an -sn -lavfi "[0:v]setpts=PTS-STARTPTS[dist];[1:v]setpts=PTS-STARTPTS[ref];[dist][ref]libvmaf=n_threads=4" -f null -
...
[Parsed_libvmaf_2 @ 0x76977c003580] VMAF score: 94.440089
Now, why does this work? Because all the frames are shown at their appropriate PTS and both video streams start at the same time, at least as far is libvmaf is concerned, which in turn only needs to compare the resulting frames side by side. And I also think that it is correct to not decimate frames in the reference, because there are ever so tiny differences in consecutive frames even if they were encoded from identical duplicates; that's just how the codecs work. That explains, that the VMAF score is not perfect. But it almost never is anyway. Even feeding the same video stream as [dis] and [ref] result in a VMAF score <100. So I think the above 94.44 is a very reasonable outcome.
P.S.: I just checked with the OP's command, i.e. -r 24
/-r 48
. The result is identical. So I stand by my proposal to drop the -r 24
. I would also like to see examples for the cases where it is "definitely necessary" to set it, because maybe there were some flawed assumptions involved in designing the test setups? I am also open to the possibility that I am wrong, but until I see proof, I maintain that the culprit is -r 24
.
P.P.S.: Also, 24 might be a suboptimal choice since 1/24 is an periodic decimal number but 1/25 is .04. That does matter with PTS calculation, because of rounding errors, I believe.
But I think having different -r values won't generally make sense as vmaf shouldn't be expected to work properly on different framerate inputs.
I think that assumption is wrong as can be seen above. The two inputs are correlated so there is no reason to expect two consecutive frames to be vastly different to begin with. And, as I understand/infer from the examples, the PTS is what matters. I have no clue of the inner workings of libvmaf, but what if it simply ignores the frame change in the filter chain with higher fps, since the PTS of the lower fps input has not changed? Or, what if it averages over two frames and compares to the one in the other input? In case the higher fps file does actually has changes in every single frame, then the VMAF should suffer, IMHO, since simply dropping half the frames results in actual loss of information.
I have an example file, which is a mix of duplicate frames and actual 60 fps content, which becomes clear when one steps through the frames. I transcoded it to an otherwise lossless (-qp 0) 30 fps (-vf fps=source_fps/2) version. The VMAF score is still a very reasonable 88.75 96.45 harmonic mean and 93.73 min (edit: had [ref] and [dis] mixed up). I would like to provide a sample, but I acquired it using yt-dlp from youtube and am not sure about the policy. I could provide the yt-dlp command to get it, but I don't want to wake up sleeping dogs either, if you catch my drift.
Edit
Here is the ffmpeg command:
ffmpeg -i sample.30fps.webm -i sample.60fps.webm -an -sn -lavfi "[0:v]setpts=PTS-STARTPTS[dist];[1:v]fps=30,setpts=PTS-STARTPTS[ref];[dist][ref]libvmaf=n_threads=4:pool=harmonic_mean:ts_sync_mode=nearest:log_path=vmaf.json:log_fmt=json" -f null -
Without the fps filter the minimum drops to 24.6, which suggests that decimating frames in the reference is the more correct approach, so I was wrong on that.
Edit2: I just realized that ts_sync_mode=nearest
makes a huge difference! Without it the minimum is <25.
After some more thinking and tinkering I have found a way to hack around ab-av1 forcing both input frame rates to 24 fps:
ab-av1 vmaf --reference sintel.fpstest.mp4 --distorted sintel.fpstest.x264.crf19.veryfast.mp4 --reference-vfilter 'setpts=(PTS-STARTPTS)/2,select=not(mod(n\,2))' --vmaf ts_sync_mode=nearest:pool=min
...
[2024-07-24T18:33:05Z DEBUG ab_av1::vmaf] cmd `ffmpeg -r 24 -i sintel.fpstest.x264.crf19.veryfast.mp4 -r 24 -i sintel.fpstest.mp4 -filter_complex [0:v]format=yuv420p,setpts=PTS-STARTPTS[dis];[1:v]format=yuv420p,setpts=(PTS-STARTPTS)/2,select=not(mod(n\,2)),setpts=PTS-STARTPTS[ref];[dis][ref]libvmaf=ts_sync_mode=nearest:pool=min:n_threads=6 -f null -`
...
88.379074
The two important things happening here are: 1) the vmaf filter uses timestamp rounding method 'nearest' and 2) the timestamps of the reference video are halved and only even frames selected to match the actual frame rate on the distorted input. I used pool=min
to catch the worst case, where frames might be selected out of sync and thus cause a drop in VMAF score.
So it seems as if ts_sync_mode=nearest
is the better choice for frame selection on libvmaf's end. Even the select
filter seems unnecessary, because I get the same result without it. Not sure yet, but maybe libvmaf really only compares frames with matching timestamps and ignores the ones out of sync, i.e. every odd frame of the reference in this case.