vapoursynth How to manage resources like mpv

While using mpv with glsl-shaders config it doesnt take much resources, but when i try to launch it via cmd: mpv in.mkv --vf=vapoursynth="extension.vpy" --hwdec=nvdec --vo=gpu -end 30 --ovc=libx265 --ovcopts=crf=18 --gpu-api=d3d11 --oac=libopus --oacopts=b=32000 --ovc=libx264 --vo=gpu --profile=gpu-hq --hwdec=nvdec-copy -o out.mkv I have almost full CPU/RAM usage, while with mpv it is less than 10% on the same shaders set. Can you suggest, please, how it is possible to overcome this issue? Any advice is appreciated, even c++ workaround This also applies for ffmpeg: vspipe --y4m extension.vpy - | ffmpeg -t 30 -i pipe: out.mp4 All the script doing is core.placebo.Shader(clip, shader=f'{shaders_path}/gpu_work.glsl')

Aug 02 '21 00:08 codepause

This is probably because libplacebo is trying to run on a GPU, whereas VapourSynth is a CPU framework. This causes high overhead in PCIe transfers and synchronization. You should only run CPU filters through VapourSynth.

Aug 02 '21 00:08 sekrit-twc

@sekrit-twc Got it. So only way is to wrap placebo in a custom cpp code? No premade solutions? Also its kinda weird, because ive seen code for CNN (GAN) image processing inside vapoursynth. You do not want to do it on cpu

Aug 02 '21 01:08 codepause

libplacebo only runs on GPU. In order to improve the performance, you would need to rewrite it to execute shaders on CPU, or use a software GL/VK/D3D emulator. The reason your shader runs much faster in mpv is that the video data is already on the GPU. Compare these pipelines:

mpv:

ReadBitsFromDisk <-- On CPU
SendCompressedToGPUHWDec <-- On GPU
DecompressToTexture <-- On GPU
ExecuteShader <-- On GPU
PresentBuffer <-- On GPU

VS:

ReadBitsFromDisk <-- On CPU
DecompressToRAM <-- On CPU
UploadToTexture <-- On GPU
ExecuteShader <-- On GPU
DownloadFromTexture <-- On CPU
CopyToPipe <-- On CPU

Fundamentally, as a VS plugin, your shader can not run any faster, because the data needs to be retrieved from the GPU before it can be used. When using libplacebo in a VS script executed by MPV, it is even worse, because now the frame must be copied several times:

mpv+VS:

ReadBitsFromDisk <-- On CPU
SendCompressedToGPUHWDec <-- On GPU
DecompressToTexture <-- On GPU
DownloadFromTexture <-- On CPU / sending to VS
UploadToTexture <-- On GPU
ExecuteShader <-- On GPU
DownloadFromTexture <-- On CPU / returning to mpv
UploadToTexture <-- On GPU
PresentBuffer <-- On GPU

Aug 11 '21 03:08 sekrit-twc

Thank you for a very detailed pipeline description! As you've stated, solution with cmd mpv takes more time comparing to vspipe solution: MPV cmd solution: encoded 500 frames in 41.62s (12.01 fps): mpv test_in.mp4 --glsl-shaders="~~/shaders/gpu_work.glsl;" --hwdec=nvdec-copy --hwdec-codecs=all --vo=gpu --ovc=libx265 --ovcopts=crf=7 --gpu-api=d3d11 --oac=libopus --oacopts=b=32000 --profile=gpu-hq --vf=vapoursynth="example.vpy":2:64 -o test_out.mp4 vspipe cmd solution: Output 499 frames in 34.35 seconds (14.53 fps): vspipe --y4m example.vpy - | ffmpeg -i pipe: -c:v libx264 -crf 7 test_out.mp4

At this point it makes no sense for me why transfering from GPU (DownloadFromTexture) and sending (CopyToPipe) eats remaining 9.47 fps (24.00-14.53, assuming video is 24 fps). While 'almost the same' procedure happens in mpv+VS(+CopyToPipe -o mpv option) pair and only costs 2.5 fps (14.53-12.03). (My idea is to look on the time difference in VS-mpv and mpv+VS-VS pairs pipeline. As mpv+VS⊃VS in terms of sets of operations) As stated in nvidia blog it only takes around ~1ms for 4 MB block to transfer GPU-CPU vise-versa. Where could i be wrong?

Also, As far as i know, currently there is no [documented] tools in vapoursynth to take control over memory allocating. i.e. uploading to VRAM before shader execution? Thus it is possible to run VS scripts directly on GPU (deep learning tensorflow/torch - like tensors). Is there any chances on docs for VS.core module handling that CPU/GPU memory sync? Is it even a good idea trying to make this through source code myself?

Thanks for your patience.

Aug 12 '21 21:08 codepause

Thank you for a very detailed pipeline description! As you've stated, solution with cmd mpv takes more time comparing to vspipe solution: MPV cmd solution: encoded 500 frames in 41.62s (12.01 fps): mpv test_in.mp4 --glsl-shaders="~~/shaders/gpu_work.glsl;" --hwdec=nvdec-copy --hwdec-codecs=all --vo=gpu --ovc=libx265 --ovcopts=crf=7 --gpu-api=d3d11 --oac=libopus --oacopts=b=32000 --profile=gpu-hq --vf=vapoursynth="example.vpy":2:64 -o test_out.mp4 vspipe cmd solution: Output 499 frames in 34.35 seconds (14.53 fps): vspipe --y4m example.vpy - | ffmpeg -i pipe: -c:v libx264 -crf 7 test_out.mp4

At this point it makes no sense for me why transfering from GPU (DownloadFromTexture) and sending (CopyToPipe) eats remaining 9.47 fps (24.00-14.53, assuming video is 24 fps). While 'almost the same' procedure happens in mpv+VS(+CopyToPipe -o mpv option) pair and only costs 2.5 fps (14.53-12.03). (My idea is to look on the time difference in VS-mpv and mpv+VS-VS pairs pipeline. As mpv+VS⊃VS in terms of sets of operations) As stated in nvidia blog it only takes around ~1ms for 4 MB block to transfer GPU-CPU vise-versa. Where could i be wrong?

Is it a typo? You're using libx265 in in mpv, but libx264 in ffmpeg

Sep 04 '21 14:09 pdr0github