TensorRT ✨[Feature] sampling within the model

✨[Feature] sampling within the model

Open jjh42 opened this issue 8 months ago • 3 comments

Exporting a model that uses torch.Categorical().sample to sample from the logits.

I currently have a (fixed length) loop within a torch.compile graph that includes sampling from the logits to choose an output and feeding that in as the next input (a standard auto-regressive model).

I see the examples in this repo of gpt2 etc all use greedy sampling (i.e. they're not stochastic) and trying to export my model gives an error

raise UnsupportedOperatorException(
torch_tensorrt.dynamo.conversion._TRTInterpreter.UnsupportedOperatorException: Conversion of function torch._ops.aten.aten::multinomial not currently supported!

Is there any workaround or is sampling not currently possible in tensorrt? I know you can sample outside the model but in my case it is much better encapsulated to have the sampling inside the model.

This can be consider a feature request to support multinomial in torch_tensorrt I guess.

Apr 01 '25 19:04 jjh42

Can you provide a reproducer of this issue? The simplest way is to probably have the sampling in a PyTorch block since I'm not sure if TRT can handle it. What is odd here is that you are getting past capability partitioning.

You can try doing torch_exectued_ops=[torch.ops.aten.multinomial]

Apr 01 '25 21:04 narendasan

thanks, will give a repro soon, I think its really anything with a Categorical.

If you use a torch_executed_op then you won't be able to run the model with the tensorrt C++ runtime?

Apr 02 '25 22:04 jjh42

You can trace with torch.jit.trace and still use it with the libtorchtrt_runtime

Apr 03 '25 23:04 narendasan

TensorRT TensorRT copied to clipboard

✨[Feature] sampling within the model

TensorRT
TensorRT copied to clipboard