gpu-manager icon indicating copy to clipboard operation
gpu-manager copied to clipboard

使用gpu-manager调度的pod起不来,会报Segmentation fault (core dumped) ,cuda版本是11.1

Open Justin-ZL opened this issue 2 years ago • 3 comments

Justin-ZL avatar Feb 08 '23 08:02 Justin-ZL

so which gpu card u used?

DennisYoung96 avatar May 31 '23 09:05 DennisYoung96

I had the same problem tesla p4

ls-2018 avatar Nov 20 '23 16:11 ls-2018

Maybe your application call nvmlInit() to init CUDA enviroment.

Try add this code at the start of your python entrypoint to find out which function makes seg fault. ( if your application is writen in python)

import faulthandler
faulthandler.enable()

And if nvmlInit() make this happen, these code can make it througth

from ctypes import CDLL, c_int, byref
nvml_h = CDLL("libnvidia-ml.so.1")
nvml_h.nvmlInit_v2()

hnyoumfk avatar May 29 '24 08:05 hnyoumfk