onnxruntime-server icon indicating copy to clipboard operation
onnxruntime-server copied to clipboard

Threadpool query and performance tweaking

Open fullymiddleaged opened this issue 11 months ago • 0 comments

Hey,

So, I've come from a Python implementation where I ran a worker per core (Gunicorn) and instructed Onnxruntime to use intra_op_num_threads = 1 and assign each to the correct core that the worker is running on. This seemed to help with peformance, as the inference sessions weren't fighting over cores. See snippet below:

cpu = psutil.Process().cpu_num()
sess_opt = rt.SessionOptions()
sess_opt.intra_op_num_threads = 1
cpu = str(cpu)
cpuoptions = "'session.intra_op_thread_affinities', '"+cpu+"'"
sess_opt.add_session_config_entry('session.intra_op_thread_affinities', cpuoptions)
onnxsession = rt.InferenceSession(config.MODEL_PATH, sess_options=sess_opt, providers=['CPUExecutionProvider'])

Is there a way to set up your server so we can ensure sessions are locked down per core? (Or any way to ensure the best response time)?

Cheers! :)

fullymiddleaged avatar Jan 20 '25 23:01 fullymiddleaged