optimum
optimum copied to clipboard
Make ORTModel PyTorch free
Feature request
Currently, ORTModel has a hard dependency on torch
and transformers
. Could we make it such that this dependency is soft, and that Optimum + ORTModel can be used without PyTorch? Would this be useful and elegant?
Motivation
One of the reason ONNX Runtime is nice is that it is much lighter than PyTorch. By forcing the dependency on PyTorch, one of the advantages of ONNX Runtime is lost.
One difficulty is that encoder-decoder models use generate()
from transformers.
Your contribution
None ATM, just an idea I have in mind, not sure it makes sense to commit to this.
Related issues: https://github.com/huggingface/optimum/issues/524 https://github.com/microsoft/onnxruntime/issues/13808
Looking back, understanding/sharing upfront the implications of this design choice would have helped avoid many issues.
It definitely makes sense to commit to this. In our production environment we use transformers
+ onnx
and it's great that transformers
do not have hard dependency on torch
or tensorflow
. And now we want to use ORTModelForSeq2SeqLM
and it's a bit frustrating that we need to install heavy torch
that is not going to be used at all.
@vgrabovets Thanks for the feedback! Yes, it's very subideal to have this dependency on torch
. We would need to make generate()
(or part of) torch.jit.script
-able to get rid of the dependency (or reimplement it so that it is scriptable).
An other option is to use BeamSearch
/ GreedySearch
available in ONNX Runtime. The issue is that they are specific to CPUExecutionProvider and CUDAExecutionProvider, so not very flexible, for example if you want to use TensorRT or Nvidia Triton. Reference: https://github.com/huggingface/optimum/issues/558
Hi, are there any updates on this issue?
Recently I've converted my transformers pytorch model to onnx, and trying to packing it with Cx_Freeze. It is very frustrating that the generated package is 1.4GB and torch library consumes nearly 1GB. The reason to use OnnxRuntime will be lessen if we still need torch to distribute the package.