Transformers4Rec icon indicating copy to clipboard operation
Transformers4Rec copied to clipboard

[QST] Edge compute using Transform4rec models with ONNX runtime

Open fire opened this issue 3 years ago • 5 comments

❓ Questions & Help

Details

I was wondering if there was a tutorial or a workflow for using Transform4rec models on edge compute using ONNX runtime?

Would be interested to collaborate.

fire avatar Feb 23 '22 19:02 fire

@fire hello. Thanks for your question. Currently, we do not have an example with ONNX runtime. You can contribute of course if you want to.

rnyak avatar Feb 23 '22 19:02 rnyak

Any suggestions on how to approach this?

  1. Generate a model using transform4rec
  2. ???
  3. Convert model from torchscript to an onnx model
  4. Execute onnx model using onnxruntime on directml (nvidia, intel, amd), cuda directly and cpu.

fire avatar Feb 23 '22 19:02 fire

@fire that'd be a question for Nvidia Triton team. You can ask your question on Triton repo. Thanks.

rnyak avatar Feb 23 '22 19:02 rnyak

I cross posted https://github.com/triton-inference-server/server/issues/3976. Feel free to keep this open or close if wanted.

fire avatar Feb 23 '22 19:02 fire

You can just export the trained torch model to onnx, and use onnx runtime for inference with the same inputs as used during training. See for example this repo for how to do transformer inference in the browser using onnx runtime https://github.com/jobergum/browser-ml-inference

jobergum avatar Feb 24 '22 14:02 jobergum