onnxruntime Segmentation fault when running onnxruntime inside docker with cpuset restrictions

Describe the bug A clear and concise description of what the bug is.

Onnxruntime crashes when I run it inside Docker with CPU limitations specified by "cpuset-cpus". The crash doesn't happen when running Docker without "cpuset-cpus" arg, or running Docker with "cpuset-cpus" with a lot of CPU cores.

Urgency If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
ONNX Runtime installed from (source or binary): pip
ONNX Runtime version: 1.7.0
Python version: 3.8
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

To Reproduce

Describe steps/code to reproduce the behavior.
Attach the ONNX model to the issue (where applicable) to expedite investigation.

Hardware: 32 core AMD CPU (64 threads). 4x 2080Ti GPUs

The crash doesn't happen when I provision many cores, such as "--cpuset-cpus 0-31".

docker run --rm -it --gpus all --cpuset-cpus 0-15 nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04

then, inside docker container

apt update
apt install python3-pip wget
pip3 install onnxruntime
wget https://github.com/onnx/models/blob/master/vision/classification/mnist/model/mnist-7.onnx?raw=true -O mnist.onnx
python3

then, inside python3:

import onnxruntime as ort
ort.InferenceSession('mnist.onnx') # crash!

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

Apr 01 '21 17:04 yindavidyang

Can you paste the stack trace here?

Apr 01 '21 21:04 pranavsharma

there's no stack trace -- just a one-liner message saying core dumped.

Apr 02 '21 20:04 yindavidyang

Run gdb <executable name> <core file name>. See this to get the location of the core file: https://askubuntu.com/a/1109747

Apr 02 '21 21:04 pranavsharma

Could you tell me how to collect core dump when running inside a docker image (note that this crash only happens within a docker image with a cpuset-cpus setting)? I followed the askubuntu instructions but kept getting "read only file system" errors when trying to configure apport, and the "/var/crash" folder was empty after the crash.

Another question: I guess executable name should be python3, correct? The command that led the core dump is python3 -c "import onnxruntime as ort; ort.InferenceSession('mnist.onnx')".

BTW sometimes I got a "bus error (core dumped)" message instead of segmentation fault.

Apr 03 '21 11:04 yindavidyang

I found a "core" file in the current folder where I ran python3. Here's what GDB says:

Core was generated by `python3'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f2862dded52 in std::_Function_handler<bool (), onnxruntime::concurrency::ThreadPoolTemplonnxruntime::Env::WorkerLoop(int)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-38-x86_64-linux-gnu.so [Current thread is 1 (Thread 0x7f284d2ba700 (LWP 825))]

Apr 03 '21 11:04 yindavidyang

I cannot repro the crash.

(pranav-py37) pranav@FooMachine:~$ docker pull nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04
(pranav-py37) pranav@FooMachine:~$ docker run --rm -it --gpus all --cpuset-cpus 0-15 nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04
root@113a6dc63d69:/# python3
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onnxruntime as ort
>>> ort.InferenceSession('mnist.onnx')
<onnxruntime.capi.onnxruntime_inference_collection.InferenceSession object at 0x7ffb9a09e280>
>>>

Apr 08 '21 08:04 pranavsharma

I also met this problem. I use mcr.microsoft.com/azureml/onnxruntime:v1.6.0-cuda10.2-cudnn8 with docker-compose cpuset: 0-7, it crash with core dump on CentOS Linux release 7.4.1708 host OS.

Sep 19 '21 02:09 austingg

Seen this problem as well. A solution that worked for me was to set the number of intra_op_num_threads to something corresponding to the number of available cores:

import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = 8
sess = ort.InferenceSession('some_model.onnx', sess_options=sess_options)

Nov 10 '21 11:11 srolsorama

Seen this problem as well. A solution that worked for me was to set the number of intra_op_num_threads to something corresponding to the number of available cores:
import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = 8
sess = ort.InferenceSession('some_model.onnx', sess_options=sess_options)

Thank you so much! That works for me!!!

Dec 06 '21 04:12 ppalantir

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

Apr 17 '22 10:04 stale[bot]

Seen this problem as well. A solution that worked for me was to set the number of intra_op_num_threads to something corresponding to the number of available cores:
import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = 8
sess = ort.InferenceSession('some_model.onnx', sess_options=sess_options)

Thank you so much! It solves my problems as well.

Jul 20 '22 11:07 yuleichin

也看到了这个问题。一个对我有用的解决方案是将 intra_op_num_threads 的数量设置为与可用内核数量相对应的值：
import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = 8
sess = ort.InferenceSession('some_model.onnx', sess_options=sess_options)

非常感谢！！！

Sep 16 '22 02:09 l3yx

I can't reproduce the seg fault (onnx 1.12) however, when loading the model in a container with the cpu set argument, the ort.InferenceSession simply never returns. Providing the number of threads allowed fixes my problem above though

Sep 19 '22 16:09 Buillaume

Seen this problem as well. A solution that worked for me was to set the number of intra_op_num_threads to something corresponding to the number of available cores:
import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = 8
sess = ort.InferenceSession('some_model.onnx', sess_options=sess_options)

Is intra_op_num_threads supposed to be the number of cores on the machine or the number of cores I want the ONNX model to be restricted to?

Feb 23 '23 10:02 timbmg

Hello, I wanted to mention that we observe very similar behavior.

We use InferenceSession for CPU-based inference in our production service. We deploy using EKS and specify CPU requests 3000m in the kubernetes deployment. If we deploy to t3.xlarge instance, which hosts one pod, all works well, but when we deploy to t3.2xlarge, which hosts 2 pods, we start seeing segmentation faults on shutdown. The error happens during exit of the Python process. If we hardcode intra_op_num_threads=1 (or 2), it seems to work.

Oct 26 '23 10:10 fsrajer

onnxruntime onnxruntime copied to clipboard

Segmentation fault when running onnxruntime inside docker with cpuset restrictions

onnxruntime
onnxruntime copied to clipboard