onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

[Performance] batch inference slower than frame inference

Open busishengui opened this issue 1 year ago • 3 comments

Describe the issue

batch_size batch_cost_time frame_cost_time
1 207 203
2 600 406
3 914 627
4 1234 855
5 1570 1106
6 1868 1267
7 2297 1779
8 2657 1779
9 2953 2096
10 3320 2260

To reproduce

		std::array<int64_t, 3>embedding_shape  { batch_size , seq_len, embedding_dim };
		std::array<int64_t, 2>mask_shape  { batch_size , seq_len };
		Ort::Value embedding_tensor = Ort::Value::CreateTensor<float>(
			embedding_memory_info,
			embeddings.data(),
			embeddings.size(),
			embedding_shape.data(),
			embedding_shape.size()
		);
		Ort::Value mask_tensor = Ort::Value::CreateTensor<int64_t>(
			mask_memory_info,
			mask.data(),
			mask.size(),
			mask_shape.data(),
			mask_shape.size()
		);

		std::array< Ort::Value, 2>ort_reranker_input = { std::move(embedding_tensor), std::move(mask_tensor) };
		try {
			auto ort_reranker_output = bce_nome5_model->session()->Run(Ort::RunOptions{nullptr},
				bce_nome5_model->input_names().data(),
				ort_reranker_input.data(),
				ort_reranker_input.size(),
				bce_nome5_model->output_names().data(),
				bce_nome5_model->output_names().size());
			float* reranker_output = ort_reranker_output[0].GetTensorMutableData<float>();
			output.assign(reranker_output, reranker_output + ort_reranker_output[0].GetTensorTypeAndShapeInfo().GetElementCount());
		}
		catch (const std::exception& err) {
			std::cout << err.what() << std::endl;
			return -1;
		}

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

12 layer trasnsformer

Is this a quantized model?

No

busishengui avatar Apr 02 '24 05:04 busishengui

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

github-actions[bot] avatar May 02 '24 15:05 github-actions[bot]

Hello,I cannot look for the batch inference sample, can you give me a sample?Thanks.

Bruce-WangGF avatar May 21 '24 01:05 Bruce-WangGF

Hello,I cannot look for the batch inference sample, can you give me a sample?Thanks.

you can use bert, or any transformers

busishengui avatar May 21 '24 06:05 busishengui