onnxruntime
onnxruntime copied to clipboard
[Performance] batch inference slower than frame inference
Describe the issue
| batch_size | batch_cost_time | frame_cost_time |
|---|---|---|
| 1 | 207 | 203 |
| 2 | 600 | 406 |
| 3 | 914 | 627 |
| 4 | 1234 | 855 |
| 5 | 1570 | 1106 |
| 6 | 1868 | 1267 |
| 7 | 2297 | 1779 |
| 8 | 2657 | 1779 |
| 9 | 2953 | 2096 |
| 10 | 3320 | 2260 |
To reproduce
std::array<int64_t, 3>embedding_shape { batch_size , seq_len, embedding_dim };
std::array<int64_t, 2>mask_shape { batch_size , seq_len };
Ort::Value embedding_tensor = Ort::Value::CreateTensor<float>(
embedding_memory_info,
embeddings.data(),
embeddings.size(),
embedding_shape.data(),
embedding_shape.size()
);
Ort::Value mask_tensor = Ort::Value::CreateTensor<int64_t>(
mask_memory_info,
mask.data(),
mask.size(),
mask_shape.data(),
mask_shape.size()
);
std::array< Ort::Value, 2>ort_reranker_input = { std::move(embedding_tensor), std::move(mask_tensor) };
try {
auto ort_reranker_output = bce_nome5_model->session()->Run(Ort::RunOptions{nullptr},
bce_nome5_model->input_names().data(),
ort_reranker_input.data(),
ort_reranker_input.size(),
bce_nome5_model->output_names().data(),
bce_nome5_model->output_names().size());
float* reranker_output = ort_reranker_output[0].GetTensorMutableData<float>();
output.assign(reranker_output, reranker_output + ort_reranker_output[0].GetTensorTypeAndShapeInfo().GetElementCount());
}
catch (const std::exception& err) {
std::cout << err.what() << std::endl;
return -1;
}
Urgency
No response
Platform
Windows
OS Version
10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.17.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
12 layer trasnsformer
Is this a quantized model?
No
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Hello,I cannot look for the batch inference sample, can you give me a sample?Thanks.
Hello,I cannot look for the batch inference sample, can you give me a sample?Thanks.
you can use bert, or any transformers