✨[Feature] Reducing Overhead with C++ Torchbind operation getting called up to Python

Open narendasan opened this issue 1 month ago • 0 comments

Is your feature request related to a problem? Please describe.

We are seeing that Torchbind operators from the C++ runtime getting called into Python in order to dispatch.

Describe the solution you'd like

We want to run in C++ without going back to python.

Potential solutions would be registering as a CUDA op or can we reexport so that we dont need to be lifted into python and we run more like what happens in AOTInductor or we can switch to an executorch style integration rather than torchbind

Describe alternatives you've considered

Additional context

Dec 02 '25 18:12 narendasan