Multi-Plexer Cuda accelerated tonemap Filter

Initial idea of using POCL as a cuda translation layer isnt viable because of POCL not working with image formats on cuda.

Currently reaching out to Yasroslav Pogrebnyak, the developer of the VF_overlay_cuda ffmpeg filter.

Reaching out to nyanmisaka. Seem's to have a lot of experience working on FFmpeg filters and frankly knows more than I do.

In addition to this, collaborating with Ed Borasky to confirm function on jetson platforms.

vf_tonemap_cuda.txt (renamed from .c to .txt to make github happy )

Missing: tonemap.cu with proper kernel side code. this is easy once I know how to properly call the cuda kernel side from the ffmpeg side.

Standard stride blocks should work, define total amount of blocks using height. most resolution will be 16:9, so by using height parameter, we have a higher chance of hitting divisible by 3 cleanly, so we can take advantage of cuda language data structure.

Other option is taking the R G and B value of a given pixel which is guaranteed to be *3. this might also help for other tone mapping algorithms that use relative offset from local peak luma as input for tonemapping output

Jan 10 '21 15:01 FCLC

major bodge, but the new version of /cuda_filter/vf_scale_cuda.cu should be able to do rein hard tonemapping. the below is an extract from the ffmpeg devel mailing list relevant to this:

For ease of developement, I've kept everything the same including the name of the filter, only changing the function within the file. This is very much a bodge to facilitate development. As such, for testing, this file should replace the vf_scale_cuda.cu file in ffmpeg/libavfilter/vf_scale_cuda.cu

FFmpeg should then be compiled as standard for cuda filters and should be called as you would call the standard vf_scale_cuda filter. The command would be similar to: ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf scale_cuda=Source_width:Source_Height -c:a copy -c:v h264_nvenc -b:v 5M output.mp4

The above should decode in hardware, tonemap the frame on gpu and re-encode in hardware at a given bitrate.

Jan 14 '21 20:01 FCLC

Used overlay filter as base instead of scale- seems much better for my purposes. reach out to @znmeb to ask for a test on his side.

syntax will be: ffmpeg -i INPUT -i INPUT -filter_complex 'hwupload_cuda,overlay_cuda' OUTPUT

This is a bodge for now, since it's only modifying the output of the cuda kernel itself

to use it, replace ffmpeg/libavfilter/vf_overlay_cuda.cu with this file: https://github.com/Camofelix/Jetson_ffmpeg_trancode_cluster/blob/master/cuda_filter/vf_overlay_cuda.cu

It compiles fine, but can't test without a nano for actual usage.

will require to self build:

git fetch ffmpeg source code

make clean

./configure --enable-nonfree --enable-cuda

mv ~/path/to/new/file ~/path/to/ffmpeg/libavfilter/vf_overlay_cuda.cu

make -j

get coffee

ffmpeg -i INPUT -i INPUT -filter_complex 'hwupload_cuda,overlay_cuda' OUTPUT

ffplay output

is output different?

test file: https://4kmedia.org/lg-new-york-hdr-uhd-4k-demo/

Jan 20 '21 16:01 FCLC

Further update: Currently concerned about if the jetson will be able to use the standard library of Cuda filters in tandem with decoding and encoding, ideally without making too many memory copies.

Jan 26 '21 03:01 FCLC

In my test, opencl is necessary. cuda accelerated filter is usable.

Nov 21 '21 15:11 AnterCreeper

you just need to

git clone nv-codec-headers
build ffmpeg with --enable-cuda --enable-cuda-nvcc
enjoy however, the only usable filters is scale_cuda and yadif_cuda😅 maybe i need to backport some features. like tonemap_cuda etc.

Nov 21 '21 15:11 AnterCreeper

you just need to

git clone nv-codec-headers

build ffmpeg with --enable-cuda --enable-cuda-nvcc

enjoy

however, the only usable filters is scale_cuda and yadif_cuda😅

maybe i need to backport some features. like tonemap_cuda etc.

I haven't looked at this project in a while, been working on other GPGPUrelated things.

Have they ported the cuda filters to work on the nano?

Nov 21 '21 15:11 FCLC

I don't know. I just test and finally build successfully.

Nov 21 '21 15:11 AnterCreeper

The speed......😅 Not too bad. And the GPU usage is low. h264 1080p -> h265 720p 6Mbps

Nov 21 '21 15:11 AnterCreeper

Command:

sudo ffmpeg -init_hw_device cuda=gpu:0.0 -filter_hw_device gpu -c:v h264_nvmpi -i /mnt/source/Paprika.2006.JAPANESE.1080p.BluRay.x264.DTS-FGT.mkv -vf "format=yuv420p,hwupload,scale_cuda=1280:720,hwdownload,format=yuv420p" -c:v hevc_nvmpi -b:v 6000k -preset medium -profile:v high -acodec ac3 output.mp4

Nov 21 '21 15:11 AnterCreeper

Also what i should say is that my device is throttled due to low current and voltage.😅 The power is utter garbage.

Nov 21 '21 15:11 AnterCreeper

cuvid is unusable because lacking of libnvcuvid.so.1 Due to the special architecture of jetson(?) The things are quite different.

Nov 21 '21 15:11 AnterCreeper

Multi-Plexer Multi-Plexer copied to clipboard

Cuda accelerated tonemap Filter

Multi-Plexer
Multi-Plexer copied to clipboard