onnxruntime-server icon indicating copy to clipboard operation
onnxruntime-server copied to clipboard

Feature request: Add options for post inference work e.g. Softmax

Open fullymiddleaged opened this issue 11 months ago • 1 comments

One thing that would be cool is if we could get some basic post inference work carried out in the same session. The most obvious thing I can think of is calculating the softmax values of the logits array. At the moment I work on the returned logit array myself in my Python app, but it makes sense to get that work carried out directly in the C++ session instead.

I was thinking there could be an option flag called softmax=true, or softmax=false.

e.g. model1:v1(cuda=false, softmax=true)

fullymiddleaged avatar Jan 21 '25 00:01 fullymiddleaged

Hi @fullymiddleaged Thank you for your great suggestion!

The output format of ONNX models can vary significantly. If the output were always a simple array like [0.x, 0.x, 0.x], applying a softmax operation directly would make sense. However, some models might return more complex outputs, such as {"x": [...], "y": [...]}. In such cases, it's unclear whether the softmax should be applied to x, y, or perhaps another part of the output. This variability makes implementing a universal softmax=true/false flag challenging.

One potential approach could be to add a simple Python binding to the C++ program. This way, users could define custom output filters or transformations, like applying softmax, using Python code. While I haven't personally implemented this kind of functionality before, it seems like an interesting area to explore when I have some time to dive deeper into it.

Thanks again for sharing this idea!

kibae avatar Jan 21 '25 18:01 kibae