Consider using VP9 in place of VP8 vor videos
It offers a better compression
What are the drawbacks? Can you please share documents/comparison on this? I'd be happy if we could increase quality with the same target bitrate.
I'm not aware about any big drawback (maybe a missing native support in older browsers/library). See https://bloggeek.me/vp8-vs-vp9-quality-or-bitrate/
https://caniuse.com/?search=vp9 is not clear about VP9 support in browsers. Something we should definitely know before hand.
If you thought VP8 is a resource hog, then expect VP9 to be a lot more voracious with its CPU requirements
Seems to refer to encoding though. Rest of drawbacks mentioned are probably obsolete.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
VP9 could still be sent to browsers without VP9 support. We would just need a polyfill. However people using those browsers are unlikely to have the CPU speed to decode VP9 smoothly, and it's even slower in a polyfill. Zoom uses a WASM polyfill for either video and/or audio. H264 isn't good for open source because it's patented.
It's 2023. I used to transcode my videos to VP9, but I transcode them to AV1 now. The compression quality compared to H264 is insane, as AV1 is 10x smaller with my settings even at 1080p. The videos I upload to GitHub these days are all AV1 in mp4 container. Again, the CPU speed of your target audience is in question. My laptop without AV1 hardware decode can decode the 8-bit AV1 natively in the browser smoothly, but not 10-bit. My current desktop has AV1 hardware decode, but my previous desktop can't keep up with its slow CPU.
@danielzgtg that's very interesting.
We already uses ogv.js and encode in VP9. From what I've read following your comment, AV1 hardware support is till fairly recent and would probably be difficult on most of our user's HW (although we have no numbers on our user's environment).
The AV1 bitrate quality is WOW and I'd love for us to switch. If you get a chance, I'd love an AV1 equivalent FFMPEG params list of our WebMLow preset so we could run some actual tests.
https://github.com/openzim/python-scraperlib/blob/6f93bccd2b941e76d9606972bb1d5a487ca97831/src/zimscraperlib/video/presets.py#L30-L55
I reconstructed your ffmpeg command to be ffmpeg -hide_banner -i in.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v libvpx -quality best -b:v 300k -maxrate 300k -minrate 300k -qmin 30 -qmax 42 -r 24 -g 240 -codec:a libvorbis -ar 44100 -b:a 48k out.webm. That gives frame=18296 fps=104 q=33.0 Lsize= 57615kB time=00:12:42.29 bitrate= 619.2kbits/s dup=0 drop=4549 speed=4.31x video:29417kB audio:27908kB subtitle:0kB other streams:0kB global headers:3kB muxing overhead: 0.505121%. I'm testing on a video mixing live action and animated with lots of special effects.
hardware support "-vf": "scale='480:trunc(ow/a/2)*2'", # frame size
480p? Should be able to decode that on CPU, especially at low bitrates. Some browsers don't even use hardware decoding for low-resolution videos.
Also thank you for that trunc part. I had to do the same thing for my scripts, and not knowing that command I complicated things with a Python wrapper.
"-codec:a": "libvorbis", # audio codec
Have you tried upgrading to libopus? The resulting audio sounds better. The Safari support for both Opus and Vorbis have the same warnings on caniuse.com.
"-maxrate": "300k", # max video bitrate
This doesn't look right. The webm command gave me 619.2kbits/s, way over 300k. ~~I doubt that's all because of audio.~~ Wow, copying using -c:v copy and -an gives 317.5kbits/s. The minrate cap also seems to be wasting space. That together with qmin and qmax seem to be imposing contradictory or unsatisfied constraints.
I'd love an AV1 equivalent FFMPEG params list
I matched the quality and speed to ffmpeg -hide_banner -i in.mp4 -vf "scale='480:trunc(ow/a/2)*2',format=yuv420p10le" -c:v libsvtav1 -preset 7 -crf 35 -r 24 -g 240 -c:a libopus -b:a 48k out.mp4. It resulted in frame=18296 fps=123 q=35.0 Lsize= 29291kB time=00:12:42.29 bitrate= 314.8kbits/s dup=0 drop=4549 speed=5.11x video:24698kB audio:4211kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.324344%. Of that, 265.8kbits/s was video.
A lot of the artifacts I saw in the webm are gone. This alone shouldn't be used to measure AV1 because AV1 encoders like to smooth things out. The colors are sharper, but that might just be from the 10-bit. The definitive evidence for AV1 is that I can see faces clearly when I couldn't see some before.
| Flag | Effect |
|---|---|
scale= |
AV1 is better than other codecs at compressing high resolutions. This reduces file size and users' screen sizes should be considered if it's desired to increase this. |
format=yuv420p10le |
Improves compression and color quality. Remove to speed up encode and decode on some computers at the risk of introducing artifacts and increasing file size. |
-preset 6 |
Matched to webm encoding speed. It's still faster and higher quality. Without it, I got 30.5 MiB instead of 28.6MiB but speed=24.5x instead of speed=5.11x. I use -preset 4 which is the limit for multithreading, but that is very slow. |
-crf 35 |
Sets compression quality. libsvtav1 hasn't implemented bitrate properly yet, so I had to match the crf. 35 is actually the default setting. I use 60 for PowerPoints, and you could try 40 to see if that's acceptable. With 40, it still looked better than webm and I got 237.5kbits/s and 21.6 MiB. |
-r 24 |
Reduces framerate to movies' 24FPS. Assuming your input is 24+ FPS, else add a check to reduce this further. |
-g 240 |
Keyframe every 10s. libsvtav1 doesn't have scene change detection. This is specified to have the keyframe interval consistent for now. Keyframes affect video seek times. |
-b:a 48k |
A common value for audio quality, and I can't hear a difference. With the upgrade to Opus, try lowering this. |
Encode times may be a concern without a fast CPU with 16+ cores. Removing the -preset 6 makes the encode both faster and still look better than webm. I think that decode performance is tied to bitrate, so improving compression may make it easier to decode. The encoded webm, with vaapi disabled so an AVX2 CPU only, was at 5.6%. The 21.6MiB AV1 was at 6.5% CPU, and the 28.6 MiB one was at 7.2% CPU. So we achieved the goal of improving compression. If users' computers heat up more, we can tell them that it's because we improved the video quality.
The commands should only set narrow min/max limits on the quantizer quality or bitrate, but not both. I'm seeing high failure rates in transcoding and I bet it's because it can't always match two constraints at once.
Have you tried upgrading to libopus? The resulting audio sounds better. The Safari support for both Opus and Vorbis have the same warnings on caniuse.com.
Didn't realize ogv.js supported Opus. We shall switch indeed.
Thanks for all the details ; it's a great time and quality gain for me. I will run a couple tests and try this on an actual recipe and if it turns out fine, we'll make it the default.
min/max limits on the quantizer quality or bitrate, but not both I'm seeing high failure rates in transcoding
I think the encoder will just ignore contradictory options if they are supplied. At least, this is what some encoders do. If the transcoding jobs fail, it must be because of disk/network failures or running out of RAM. You could also try to remux with '-c:v copy -c:a copy' before transcoding as that fixes some files for me. Further remux options are regenerating the timestamps, or extracting into separate files then recombining the tracks.
ogv.js supported Opus
Didn't realize ogv.js supported AV1 either. With the previous 480p 8-bit encoded video, it was 35% CPU on an AVX512 CPU and 60% CPU on AVX2-only CPU.
10-bit is not supported (https://github.com/brion/ogv.js/issues/626). mp4 container is not supported, only webm (https://github.com/brion/ogv.js/issues/443).
Indeed ffmpeg doesn't fail. I've manually tested numerous of the failing ffmpeg commands and they all succeeded so the failures are probably resources related.
That said, it's important to fix because as you indicated options are ignored and we are thus creating larger files than we expect to.
I subscribed to those two tickets. 10b seems like the most important for us. https://github.com/brion/ogv.js/commit/eef47bf6aff9de9f456e83287e9ddac74cb91beb shows there is no support and the compilation support is not the root cause.
Followup on my previous comment - I made some snapshots of the videos directory and re-checked against errors. The errors are a scraper bug that requests transcoding of files that never existed.
The conflicting quality bounds and bitrate bounds tuning does greatly degrade ffmpeg. For the quality and bitrate, only set the target quality and the maximum bitrate. It encodes much faster and slashes the file size for those whiteboard videos. (Any settings to reduce key frames or use a variable frame rate may help too.)
Also, ffmpeg is multi-threaded. Running core * ffmpeg processes results in core^2 active threads. All that context switching is very inefficient.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
Looks like vp9 is now natively supported on iOS/macOS https://www.theoplayer.com/blog/vp9-support-now-possible-on-apple-devices-and-all-major-platforms
It's still HLS-only. We'd have to support HLS on our readers to use it
@rgaudin tested a VP9 encoded ZIM file in Kiwix on macOS 14 and it works.
Encoding to VP9 takes significantly longer to encode than VP8 in his experiment, but the result (same quality) is a size reduction around 20%.
IMHO there is nothing stoping us to use VP9 in place of VP8 from now.
@rgaudin @kevinmcmurtrie @benoit74 Any remark?
Works as well on Kiwix iOS (iOS 17.4.1)
I think we've still not answered the major concerns around this move:
- which browser versions started to support VP9 natively?
- do we have the proper polyfill working correctly when VP9 is not natively supported?
- is VP9 acceptable in terms of decoding "power" needed (natively / with polyfill), especially on low-end devices like phones and tablets?
And in both cases, are the corresponding requirements acceptable, especially given our commitments to support some of our core clients having to support tablets for xx years?
Where is the test ZIM so that I can at least try on our beloved tablets?
And finally, what about HLS? Are we sure we do not want to wait for HLS implementation in readers to make the move?
https://tmp.kiwix.org/ci/tests_fr_reg-videos_2024-05.zim
Note that this VP9 video has been encoded with -c:v libvpx-vp9 instead of -c:v libvpx without any other change. The time it took to encode this 14mn (90mn!!) seems to indicate we need better ffmpeg args for this. We just wanted to know if the WKWebView in the Kiwix Apple reader supported VP9.
which browser versions started to support VP9 natively?
https://en.wikipedia.org/wiki/VP9#Browser_support
do we have the proper polyfill working correctly when VP9 is not natively supported? is VP9 acceptable in terms of decoding "power" needed (natively / with polyfill), especially on low-end devices like phones and tablets?
VP9 is supported since a very long time, but in worse case ojv.js will play the role of decoding
@rgaudin tested a VP9 encoded ZIM file in Kiwix on macOS 14 and it works.
Encoding to VP9 takes significantly longer to encode than VP8 in his experiment, but the result (same quality) is a size reduction around 20%.
IMHO there is nothing stoping us to use VP9 in place of VP8 from now.
@rgaudin @kevinmcmurtrie @benoit74 Any remark?
https://github.com/openzim/zimfarm/issues/754 would be helpful so that GPU devices can be exposed to docker containers. Even an integrated GPU's driver might boost FFmpeg a bit.
https://www.webmproject.org/about/faq/ is an important source of information
Especially entry below which mostly answers my concerns regarding VP9 support:
I don't know how true this expectation holds, but it means that at least theoretically a transition from VP8 to VP9 should be transparent for end-users
Videos of the test ZIM are working well (playing, seeking, fullscreen) on the "recent" tablet from Orange: Chrome 87, Android 11
All except the VP9 are also working well on the "old" tablet from Orange: Chrome 43, Android 5.1.1. On this tablet, the same VP9 video is however playing in MX Player (which was already installed on my table, not sure if it is installed by default or was added by a previous user) with HW decoder ; seeking seems rather erratic, but still working after waiting a bit, so quite sure it is more a problem in how MX Player handles this for web flows.
For me this proves that VP9 support is sufficient to make the transition.
Did someone already understood (at least a bit) what would be the proper ffmepg settings for vp9 encoding? I intend to have a look into it otherwise.
Did someone already understood (at least a bit) what would be the proper ffmepg settings for vp9 encoding? I intend to have a look into it otherwise.
Might be a good opportunity to look at hardware acceleration. CPU sure is simple and generic but it's also crazy slow compared to HW-accelerated encoding (CPU or GPU)
I'm not sure hardware acceleration for VP9 encoding is a thing to have a look into.
Wikipedia mention it has never taken off and other sources in Google Search and ChatGPT seems to corroborate the thing.
Encoders are usually tight to a specific hardware (NVidia, ...). We have no idea about which kind of graphic chipset would be available on our workers.
And Zimfarm is not yet capable to pass proper option for GPU access from inside the Docker container.
Given all this, is it really relevant to spend time on hardware encoding for VP9?
Most recent CPUs include acceleration for it from what I read
- What is our baseline with VP8?
- What would it be with VP9?
- What could we expect once accelerated (roughly)?
I found few interesting online sources around setting up VP9 encoder correctly (this is not an exhaustive review of existing sources):
- https://www.reddit.com/r/AV1/comments/k7colv/encoder_tuning_part_1_tuning_libvpxvp9_be_more/
- https://developers.google.com/media/vp9/settings/vod
I did few tests around these VP9 recommendations (mostly centered around google recommendation for VOD, i.e. with 2 passes) and unfortunately they were not conclusive: final video was bigger than VP8 (using v2 encoder presets from scraperlib) and longer to encode on my Mac M1 Pro with https://tmp.kiwix.org/ci/test-videos/ted-fast-movements/ted-the-trick-to-regaining-your-childlike-wonder-zack-king-raw.mp4 video.
VP8 args:
-codec:v libvpx -quality best -b:v 128k -qmin 18 -qmax 40 -vf scale='480:trunc(ow/a/2)*2' -an
VP9 args:
pass1:
-codec:v libvpx-vp9 -b:v 150k -minrate 75k -maxrate 218k -tile-columns 0 -g 240 -threads 2 -quality good -crf 37 -vf scale='480:trunc(ow/a/2)*2' -an -pass 1 -speed 4
pass2:
-codec:v libvpx-vp9 -b:v 150k -minrate 75k -maxrate 218k -tile-columns 0 -g 240 -threads 2 -quality good -crf 37 -vf scale='480:trunc(ow/a/2)*2' -an -pass 2 -speed 1
My feeling now is that migrating to VP9 will deserve significant effort to find proper settings AND because we will need to adapt scrapers / presets to 2 pass (which seems significantly recommended).
For anyone willing to help, videos to use for tests are located at https://tmp.kiwix.org/ci/test-videos/ (we have 3 videos, use the raw.mp4 on all 3 folders).
Target size is -vf scale='480:trunc(ow/a/2)*2' (unless this proves to be significantly problematic for VP9)
Constrained Quality mode seems to not work. It follows the requested average bitrate so tightly that it's essentially CBR. Options crf, minrate, and maxrate do nothing except at extreme nonsense values. The q thrashes beyond sane values while encoding.
I'm not sure if it's broken or if there are other considerations not documented.
The average bitrate config in VP9 seems to have an extremely tiny window for measurement. It can fit an I-frame to reduce flicker, but high motion frames can't borrow from low motion frames. 2-pass doesn't impress me.
A workaround I found is to pretend like the average bitrate doesn't exist; you can only set the limit using the b:v option.
ffmpeg -i infile.mp4 -vf "scale='480:trunc(ow/a/2)*2'" -c:v vp9 -b:v 600k -minrate 0 -maxrate 1200k -crf 30 -g 240 -quality good -speed 0 -auto-alt-ref 1 -lag-in-frames 25 -undershoot-pct 100 -overshoot-pct 100 -codec:a libvorbis -b:a 48k outfile-b.webm
This produces a 12.6 MB file from the TED sample in 1 pass that's pretty clean. I'm not even sure if the maxrate and overshoot-pct here does much. It seems to reduce some flicker, but I'm not sure.