vs-realesrgan
vs-realesrgan copied to clipboard
performance so much worse then using trt.Model
Hi,
I just compared
clip = core.resize.Bicubic(clip, width=720, height=574, format=vs.RGBS, matrix_in_s='709')
clip = core.trt.Model(clip, engine_path="realesr-general-wdn-x4v3_opset16_574x720_fp16.engine", num_streams=4, device_id=0)
created with
trtexec --fp16 --onnx=./realesr-general-wdn-x4v3_opset16.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x574x720 --maxShapes=input:1x3x574x720 --saveEngine=./realesr-general-wdn-x4v3_opset16_574x720_fp16.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --useCudaGraph --noDataTransfers --builderOptimizationLevel=5 --infStreams=4
gives me:
vspipe -e 300 test3.vpy -p .
Script evaluation done in 2.26 seconds
Output 301 frames in 8.15 seconds (36.92 fps)
using vs-realesrgan instead:
clip = core.resize.Bicubic(clip, width=720, height=574, format=vs.RGBH, matrix_in_s='709')
clip = realesrgan(clip, num_streams = 4, trt = True, model = 5, denoise_strength = 0)
gives me:
vspipe -e 300 test3.vpy -p .
Script evaluation done in 2.26 seconds
Output 601 frames in 23.73 seconds (25.33 fps)
How come that using core.trt.Model is almost 50% faster?