vs-mlrt icon indicating copy to clipboard operation
vs-mlrt copied to clipboard

Does vs-mlrt support multiframe ONNX models or only single image ONNX models?

Open zelenooki87 opened this issue 4 months ago • 8 comments

If it does support them, could you please explain how to run a model that is, for example, -1x9x-1,-1 and returns a single frame? Thanks a lot.

zelenooki87 avatar Oct 16 '25 06:10 zelenooki87

It does. RIFE is a built-in example, which requires two frames as input. You only need to shift the clips in time and pass it as input, i.e. Model([clip[:-2], clip[1:-1], clip[2:]]) (without padding).

WolframRhodium avatar Oct 16 '25 06:10 WolframRhodium

Can this be done used with vsmlrt.inference(..) the same way? i.e. clip = vsmlrt.inference([clip[:-2], clip[1:-1], clip[2:],network_path="<PAT TO THE MODEL FILE>", backend=Backend.TRT(fp16=True,device_id=0,bf16=False,num_streams=1,verbose=True,use_cuda_graph=True,workspace=1073741824,builder_optimization_level=3,engine_folder="J:/TRT")) (I like the convenience of the inference function. :))

for N temporal frames, one could use:

src = [clip[i:-(N - i - 1) or None] for i in range(N)]

Selur avatar Oct 16 '25 12:10 Selur

Yes.

WolframRhodium avatar Oct 16 '25 12:10 WolframRhodium

Nice! 🥇 but,... using TRT: clip = vsmlrt.inference([clip[i:-(3 - i - 1) or None] for i in range(3)],network_path="C:/Users/Selur/Desktop/testing/<ModelName>.onnx",overlap=(64, 64),tilesize=[384, 576], backend=Backend.TRT(fp16=True,device_id=0,bf16=False,num_streams=1,verbose=True,use_cuda_graph=False,workspace=1073741824,builder_optimization_level=3,engine_folder="J:/TRT")) # 720x576 It fails for me with:

clip = vsmlrt.inference([clip[i:-(3 - i - 1) or None] for i in range(3)],network_path="C:/Users/Selur/Desktop/testing/<ModelName>.onnx",overlap=(64, 64),tilesize=[384, 576], backend=Backend.TRT(fp16=True,device_id=0,bf16=False,num_streams=1,verbose=True,use_cuda_graph=False,workspace=1073741824,builder_optimization_level=3,engine_folder="J:/TRT")) # 720x576
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:/Hybrid/64bit/vs-mlrt/vsmlrt.py", line 3089, in inference
return inference_with_fallback(
^^^^^^^^^^^^^^^^^^^^^^^^
File "F:/Hybrid/64bit/vs-mlrt/vsmlrt.py", line 3063, in inference_with_fallback
raise e
File "F:/Hybrid/64bit/vs-mlrt/vsmlrt.py", line 3040, in inference_with_fallback
ret = _inference(
^^^^^^^^^^^
File "F:/Hybrid/64bit/vs-mlrt/vsmlrt.py", line 2858, in _inference
engine_path = trtexec(
^^^^^^^^
File "F:/Hybrid/64bit/vs-mlrt/vsmlrt.py", line 2213, in trtexec
raise RuntimeError(f"trtexec execution fails, log has been written to {log_filename}")
RuntimeError: trtexec execution fails, log has been written to C:\Users\Selur\AppData\Local\Temp\trtexec_251016_152814.log

trtexec_251016_152814.log

However, when I use DML: clip = vsmlrt.inference([clip[i:-(3 - i - 1) or None] for i in range(3)],network_path="C:/Users/Selur/Desktop/testing/<ModelName>.onnx",overlap=(64, 64),tilesize=[384, 576], backend=Backend.ORT_DML(fp16=True,device_id=0,num_streams=1)) # 720x576 it works. :)

Selur avatar Oct 16 '25 13:10 Selur

@WolframRhodium Does this mean we can use other open-source interpolation methods in the same way, as long as we convert them to ONNX correctly? Or is some further code adaptation required? Thanks.

zelenooki87 avatar Oct 16 '25 18:10 zelenooki87

@Selur This was an issue with the onnx model itself that only supports fixed-size input.

Does this mean we can use other open-source interpolation methods in the same way, as long as we convert them to ONNX correctly? Or is some further code adaptation required? Thanks.

One limitation is that vs-mlrt does not provide native support for recurrent models, part of this is due to the nature of VS itself. Other than that, if a model can be represented in ONNX correctly, it should be usable.

WolframRhodium avatar Oct 16 '25 23:10 WolframRhodium

@WolframRhodium Is it feasible to load this type of multi-frame model (e.g., real basic vsr)? Selur already implemented a MultiInput option in Hybrid for the initial case I mentioned, but with this model (which is 100% correct), I'm getting the following error: vapoursynth.Error: operator (): input dimension must be 4. If these models with a "Time" dimension are, in fact, recurrent models, is there any trick, wrapper, or other method to load them into vs-mlrt while maintaining temporal consistency? Thanks a lot.

Image

zelenooki87 avatar Oct 22 '25 15:10 zelenooki87

real basic vsr can be loaded, but generating the correct onnx in this case is non-trivial. There is no simple trick to correctly handle recurrent models in VS, as far as I know.

WolframRhodium avatar Oct 23 '25 05:10 WolframRhodium