FileConverter icon indicating copy to clipboard operation
FileConverter copied to clipboard

Implement hardware acceleration

Open tacheometry opened this issue 1 year ago • 6 comments
trafficstars

This PR addresses Issue #392.

I've added some Nvidia CUDA FFMPEG arguments for the mp4 format for now, but this is a good start. What remains is:

  • [ ] Supporting AMD acceleration
  • [x] Supporting NVIDIA acceleration
  • [ ] Implementing the arguments in other video formats as well (right now only for mp4)
  • [x] Use localization for the option strings
  • [X] Adding a setting in the program interface to control what type of acceleration to use (off by default)
  • [X] Maybe: hardware acceleration for the entire transcoding process. Right now only encoding and decoding are accelerated.
    • This isn't a convenient feat. For CUDA it requires changing filter names from scale to scale_cuda, or crf into qp for example. I tried doing this but couldn't get it to not error. It requires much experimentation.

All help is greatly appreciated 😀 This is the first C# program I edit...

The results I got adding these arguments (which accelerate only the encode/decode part of the process) are 2-3x faster than before, transcoding a 176 MB video into ~8MB using the To Mp4 (low quality) preset.

Development

I couldn't find much information on how to contribute to this program, but this is what I've learned so far:

Installation

The Magick.Native-Q16-x64.dll file must be copied to bin/x64/Debug for the program to compile. To obtain this file you need to follow the Magick.NET compilation guide and then grab it from C:\Users\xxxx\.nuget\packages\magick.native.

Testing

The project can be built in its entirety, and then the installer can be run, but this requires a system restart. A more efficient option is calling FileConverter.exe directly. After building the FileConverter solution, you should be able to find Application/FileConverter/bin/x64/Debug/FileConverter.exe

Opening this file will give you the tutorial window. But if you run it from the command line like so:

.\FileConverter.exe --verbose --conversion-preset "To Mp4 (low quality)" "C:\Users\DevAccount\Desktop\cs.mp4"

It is equivalent to right clicking a preset in the context menu, without all the extra steps.

Resources:

  • https://docs.nvidia.com/video-technologies/video-codec-sdk/12.0/ffmpeg-with-nvidia-gpu/index.html
  • https://trac.ffmpeg.org/wiki/HWAccelIntro
  • https://github.com/HeiSir2014/ffmpeg-wiki
  • https://stackoverflow.com/a/55747785
  • https://lists.ffmpeg.org/pipermail/ffmpeg-user/2017-July/036820.html
  • various StackExchange answers you might find

tacheometry avatar Mar 26 '24 02:03 tacheometry

Managed to get full transcoding working. It doesn't provide as much of a speed up as accelerated encoding and decoding, but for long videos it'll definitely be super useful.

tacheometry avatar Mar 27 '24 18:03 tacheometry

Benchmarks

Note: when writing ffmpeg.exe, it refers to FileConverter\Application\FileConverter\bin\x64\Debug\ffmpeg.exe, I'm not using the system ffmpeg.

Commands

Instead of modifying the program and testing every single time, I modified the effective ffmpeg command used, and implemented my modifications after getting everything working.

These following commands are executed for the To Mp4 (low quality) preset (except I'm modifying -n to -y and adding -benchmark). To see the execution time, look at rtime=1.234s in the ffmpeg benchmark output.

Acceleration off

This is used by FileConverter by default.

ffmpeg.exe -y -stats -i "input.mp4" -c:v libx264 -preset medium -crf 31 -c:a aac -qscale:a 0.75 -vf "scale=trunc(iw*1/2)*2:trunc(ih*1/2)*2,format=yuv420p" "output.mp4" -benchmark

HW accelerated encoding and decoding, CPU scaling

ffmpeg.exe -y -stats -hwaccel cuda -i "input.mp4" -c:v h264_nvenc -preset medium -crf 31 -c:a aac -qscale:a 0.75 -vf "scale=trunc(iw*1/2)*2:trunc(ih*1/2)*2,format=yuv420p" "output.mp4" -benchmark

Fully HW accelerated transcoding

ffmpeg.exe -y -stats -hwaccel cuda -hwaccel_output_format cuda -i "input.mp4" -c:v h264_nvenc -preset  medium -crf 31 -c:a aac -qscale:a 0.75 -vf "scale_cuda=trunc(iw*1/2)*2:trunc(ih*1/2)*2:format=yuv420p" "output.mp4" -benchmark

Results

input.mp4 is a 30 second long 1920x1080p 176 MB file.

To Mp4 (low quality) (1x scaling)

  • HW accel off: 14.6s
  • HW accelerated encode/decode: 5.7s (2.56x faster than base)
  • Fully accelerated transcode: 5.3s (2.75x faster than base)

To Mp4 (lowER quality) (0.5x scaling)

This preset I made changes the scaling from 100% to 50%.

  • HW accel off: 6.2s
  • HW accelerated encode/decode: 4.3s (1.44x faster than base)
  • Fully accelerated transcode: 3.3s (1.87x faster than base)

tacheometry avatar Mar 27 '24 18:03 tacheometry

I compiled @tacheometry's version and ran a few tests. 3x runs on Hardware acceleration mode = Nvidia (CUDA) & 3x runs on Hardware acceleration mode = Off. Each time the results from CUDA were atleast 2 times faster than with CPU proccessing.

nohardwareacceleration hardwareacceleration

broscoi avatar Mar 27 '24 19:03 broscoi

@Tichau Can I get a review on this?

tacheometry avatar Apr 05 '24 22:04 tacheometry

I thinks the easiest way is replace the ffmpeg file in file-converter with a self-complied version that enable all gpu-video-process feature enable. So it will work on any machine even it have NVIDIA or Intel or AMD graphic card.

ItsukaHiro avatar Apr 28 '24 00:04 ItsukaHiro

The Magick.NET compilation guide doesn't exist anymore ¯_(ツ)_/¯

ehjr5u avatar Sep 09 '24 01:09 ehjr5u