optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Make ORTModel PyTorch free

Open fxmarty opened this issue 1 year ago • 5 comments

Feature request

Currently, ORTModel has a hard dependency on torch and transformers. Could we make it such that this dependency is soft, and that Optimum + ORTModel can be used without PyTorch? Would this be useful and elegant?

Motivation

One of the reason ONNX Runtime is nice is that it is much lighter than PyTorch. By forcing the dependency on PyTorch, one of the advantages of ONNX Runtime is lost.

One difficulty is that encoder-decoder models use generate() from transformers.

Your contribution

None ATM, just an idea I have in mind, not sure it makes sense to commit to this.

fxmarty avatar Nov 29 '22 14:11 fxmarty

Related issues: https://github.com/huggingface/optimum/issues/524 https://github.com/microsoft/onnxruntime/issues/13808

Looking back, understanding/sharing upfront the implications of this design choice would have helped avoid many issues.

fxmarty avatar Dec 03 '22 08:12 fxmarty

It definitely makes sense to commit to this. In our production environment we use transformers + onnx and it's great that transformers do not have hard dependency on torch or tensorflow. And now we want to use ORTModelForSeq2SeqLM and it's a bit frustrating that we need to install heavy torch that is not going to be used at all.

vgrabovets avatar Dec 27 '22 10:12 vgrabovets

@vgrabovets Thanks for the feedback! Yes, it's very subideal to have this dependency on torch. We would need to make generate() (or part of) torch.jit.script-able to get rid of the dependency (or reimplement it so that it is scriptable).

An other option is to use BeamSearch / GreedySearch available in ONNX Runtime. The issue is that they are specific to CPUExecutionProvider and CUDAExecutionProvider, so not very flexible, for example if you want to use TensorRT or Nvidia Triton. Reference: https://github.com/huggingface/optimum/issues/558

fxmarty avatar Jan 02 '23 15:01 fxmarty

Hi, are there any updates on this issue?

Recently I've converted my transformers pytorch model to onnx, and trying to packing it with Cx_Freeze. It is very frustrating that the generated package is 1.4GB and torch library consumes nearly 1GB. The reason to use OnnxRuntime will be lessen if we still need torch to distribute the package.

image

sappho192 avatar Jan 05 '24 08:01 sappho192