onnxruntime-server ONNX model outputs an array of null

I have configured my ONNX model (based on stella_400M) to have a fixed batch size for inputs and outputs. It even runs with response 200 on the server.

My issue is the entire output of the result is an array of null of size 1024 (which is expected, that is the dimension of the model output). When I use the same ONNX model with the python ONNXRuntime library, it works as expected.

Output from Python ONNX-Runtime

array([ 0.10638401,  0.09311324, -0.555768  , ...,  0.99843943,
        0.2404    ,  0.164792  ], shape=(1024,), dtype=float32)

Output from ONNX-Runtime on onnxruntime-server

Response Body: {'sentence_embedding': [[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, ... None, None]

Also I created the ONNX model from a Stella-400M model through the following instructions: I have converted a Stella-400M model to ONNX through the following configuration:

torch.onnx.export(
    wrapper_model,
    (input_data['input_ids'][:1], input_data['attention_mask'][:1], input_data['token_type_ids'][:1]),
    os.path.join(base_model_path, 'model_static.onnx'),
    export_params=True,
    do_constant_folding=True,
    input_names=['input_ids', 'attention_mask', 'token_type_ids'],
    output_names=['sentence_embedding'],
    dynamic_axes={
        'input_ids': {1: 'sequence_length'},
        'attention_mask': {1: 'sequence_length'},
        'token_type_ids': {1: 'sequence_length'},
    },
    opset_version=18,
)

I am running the kibaes/onnxruntime-server:1.20.1b-linux-cpu docker image.

Would you have any idea why this would be happening? Or any possible fixes? Thanks!

Jan 22 '25 22:01 sidharthg-couture

Hello. @sidharthg-couture :)

Could you please display the ONNX session information? It can be queried via HTTP GET /api/sessions or HTTP GET /api/sessions/{model}/{version}. It would be great if you could show the response from this API.

Feb 01 '25 18:02 kibae

Hi, here is the response from the API:

[
  {
    "created_at": 1738662696,
    "execution_count": 0,
    "inputs": {
      "attention_mask": "int64[1,-1]",
      "input_ids": "int64[1,-1]",
      "token_type_ids": "int64[1,-1]"
    },
    "last_executed_at": 0,
    "model": "stella-v3",
    "option": {
      "cuda": false
    },
    "outputs": {
      "sentence_embedding": "float32[-1,1024]"
    },
    "version": "model"
  }
]

Feb 04 '25 09:02 sidharthg-couture

Hello, @sidharthg-couture Thank you for providing the information. :)

According to the response from the Python code, the shape appears to be (1024, -1).

Output from Python ONNX-Runtime

array([ 0.10638401,  0.09311324, -0.555768  , ...,  0.99843943,
        0.2404    ,  0.164792  ], shape=(1024,), dtype=float32)

However, the response from the ONNX session API indicates that the shape of sentence_embedding is (-1, 1024).

    "outputs": {
      "sentence_embedding": "float32[-1,1024]"
    },

How about modifying the dynamic_axes values during the ONNX export so that the shape becomes (-1, 1024)?

    dynamic_axes={
        'input_ids': {1: 'sequence_length'},
        'attention_mask': {1: 'sequence_length'},
        'token_type_ids': {1: 'sequence_length'},
        'sentence_embedding': {0: 'sequence_length'},
    },

Feb 04 '25 10:02 kibae

Hello, @sidharthg-couture Thank you for providing the information. :)

According to the response from the Python code, the shape appears to be (1024, -1).

Output from Python ONNX-Runtime

How about modifying the dynamic_axes values during the ONNX export so that the shape becomes (-1, 1024)?
    dynamic_axes={
        'input_ids': {1: 'sequence_length'},
        'attention_mask': {1: 'sequence_length'},
        'token_type_ids': {1: 'sequence_length'},
        'sentence_embedding': {0: 'sequence_length'},
    },

Do you mean (1024, -1) here? in the last bit about ONNX export? If not, then isn't the shape already (-1, 1024)

Feb 04 '25 10:02 sidharthg-couture

In your Python code, the shape is specified as (1024,), which implies a shape of (1024, -1). However, the session information from onnxruntime-server shows the sentence_embedding output as float32[-1, 1024]. Therefore, when exporting to ONNX, you should add 'sentence_embedding': {0: 'sequence_length'} so that onnxruntime-server's session information displays it as float32[1024, -1].

Feb 05 '25 09:02 kibae

To make the axes [1] as dynamic, I would have to use 'sentence_embedding': {1: 'sequence_length'} right, ottherwise it would be

"outputs": {
    "sentence_embedding": "float32[-1,1024]"
  },

This was the output after trying the suggested dynamic_axes configuration.

Feb 14 '25 13:02 sidharthg-couture

@sidharthg-couture I released version 1.21.0, would you like to try again with the new version?

Mar 11 '25 00:03 kibae

Hello @sidharthg-couture, I think I’ve identified the problem. Actually, you need to respect the order of the names provided by the API. The API gives you:

"inputs": {
      "attention_mask": "int64[1,-1]",
      "input_ids": "int64[1,-1]",
      "token_type_ids": "int64[1,-1]"
}

But when you list the names in your Python code (same thing with C++), you get:


'input_ids': {1: 'sequence_length'},
'attention_mask': {1: 'sequence_length'},
'token_type_ids': {1: 'sequence_length'},

It doesn’t throw a dimension error because in the API, [1,-1] just looks for the value. How the inputs are filled isn’t based on the input names. So the only way to fix this is to change the order of the inputs to match EXACTLY what the API expects.

Sep 26 '25 15:09 TAIBIAchraf