Feature request: Add options for post inference work e.g. Softmax
One thing that would be cool is if we could get some basic post inference work carried out in the same session. The most obvious thing I can think of is calculating the softmax values of the logits array. At the moment I work on the returned logit array myself in my Python app, but it makes sense to get that work carried out directly in the C++ session instead.
I was thinking there could be an option flag called softmax=true, or softmax=false.
e.g. model1:v1(cuda=false, softmax=true)
Hi @fullymiddleaged Thank you for your great suggestion!
The output format of ONNX models can vary significantly. If the output were always a simple array like [0.x, 0.x, 0.x], applying a softmax operation directly would make sense. However, some models might return more complex outputs, such as {"x": [...], "y": [...]}. In such cases, it's unclear whether the softmax should be applied to x, y, or perhaps another part of the output. This variability makes implementing a universal softmax=true/false flag challenging.
One potential approach could be to add a simple Python binding to the C++ program. This way, users could define custom output filters or transformations, like applying softmax, using Python code. While I haven't personally implemented this kind of functionality before, it seems like an interesting area to explore when I have some time to dive deeper into it.
Thanks again for sharing this idea!