mlsh icon indicating copy to clipboard operation
mlsh copied to clipboard

Termination due to `Bus error (signal 7)`

Open maximilianigl opened this issue 6 years ago • 5 comments

Hi,

I'm running the code using mpirun inside a docker container. It worked at first but recently I started getting the error message

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 135
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7)

As far as I'm aware of I didn't change anything. Does anyone know where this might be coming from? Thanks a lot! Best, Max

maximilianigl avatar Nov 30 '18 13:11 maximilianigl

It seems to not happen when I run with fewer parallel runs (30 instead of 50 or 40 on a server that supports up to 56 threads).

maximilianigl avatar Nov 30 '18 19:11 maximilianigl

Hi! My computer has two gpus, When I run this code, the utilization of the first one is only 5%. The second one is even zero. Do you know why my GPU utilization is so low? Do I need to modify the code appropriately according to the configuration of each computer? Thanks!

Muguangfeng avatar Apr 08 '19 08:04 Muguangfeng

Maybe this error stems from the nodes we used.

Up-Huang avatar Nov 13 '20 13:11 Up-Huang

Same thing here bro. For all I know it seems it has something to do with the docker default shared memory. I am not a 100% sure yet but right now I increased the container shared memory from 64Mb to many Gb to test.

falcaoceg avatar Dec 16 '22 11:12 falcaoceg

Same thing here bro. For all I know it seems it has something to do with the docker default shared memory. I am not a 100% sure yet but right now I increased the container shared memory from 64Mb to many Gb to test.

It works for me. When run the docker image, I add --shm-size=2000g

qiuyuleng1 avatar Jan 19 '24 07:01 qiuyuleng1