neural-style icon indicating copy to clipboard operation
neural-style copied to clipboard

There is wrong about running the command nvidia-smi

Open 251099155 opened this issue 8 years ago • 14 comments

My system is ubuntu 14.04 on win10.I have installed the CUDA 8.0 ,but when i input the command nvidia-smi,there is an error like this:

modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.4.0+/modules.dep.bin' modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.4.0+/modules.dep.bin' modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_367' modprobe: ERROR: could not insert 'nvidia_367': Function not implemented NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

how to solve it?

251099155 avatar Oct 07 '16 06:10 251099155

The installation of nvidia driver has failed. From this it is impossible to know why. One reason could be missing kernel headers. NVIDIA documentation and forum are best sources for help for this kind of problem.

htoyryla avatar Oct 07 '16 08:10 htoyryla

@htoyryla I have try to install the Navidia Driver,but it say that can't find kernel 3.4.0+.I use the command apt-cache search linux-headers,but it seems have no the headers file of linux-headers-3.4.0+ qq 20161007172149 image

251099155 avatar Oct 07 '16 09:10 251099155

Where did you get this 3.4.0+ from? It does not sound right. Which kernel do you have? What does uname -r give? I would guess 3.13.0-something.

The installation guide at http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#axzz4MOGxcY2j gives the following supported versions.

Ubuntu 16.04 4.4.0 Ubuntu 14.04 3.13

Furthermore from the guide, a helpful command to install the correct kernel headers:

Ubuntu
The kernel headers and development packages for the currently running kernel can be installed with:
$ sudo apt-get install linux-headers-$(uname -r)

I am running 14.04 with a 4.4.0 kernel (although still cuda 8.0rc, have not yet moved to 8.0).

htoyryla avatar Oct 07 '16 09:10 htoyryla

@htoyryla Here is some info of my system.So what is the problem?I get this system from win10. image

251099155 avatar Oct 07 '16 10:10 251099155

Can't really help anymore. You seem to have a 3.4.0 kernel but instead of a precise version number, it has this +. Google doesn't either give anything helpful on "kernel 3.4.0+".

Anyhow, 3.4.0 is old already and might not even work with cuda 8.0 (the docs state 3.13 and 4.4.0).

htoyryla avatar Oct 07 '16 10:10 htoyryla

Just realized (based on another issue) that you are running on Windows bash? Others, too, seem to complaining of not finding the kernel header for 3.4.0. I wonder whether it is possible to update the kernel in Windows bash?

Update: I guess you are stuck to that kernel. Saw somewhere that "the Ubuntu userspace is running not on a Linux kernel, but WSL. WSL provides the API hooks to look like Linux to Ubuntu and Linux applications, but it's not the same thing." So one cannot replace it with another linux kernel, one is stuck with whatever Microsoft provides.

htoyryla avatar Oct 07 '16 11:10 htoyryla

yes,I'm runnning on Win10's bash.Thank u a lot.but I don't konw how to change my system's kernel..

it's so hard to change the kernel. I can only use the lbfgs mehtod.

251099155 avatar Oct 07 '16 11:10 251099155

@251099155 hello, i meet the same problem, if you had solved it?

hanyaqian avatar May 11 '17 12:05 hanyaqian

I meet a similarity problem. my system is redhat7 ,my GPU is K80 in Azure .my cuda is cuda8. my error is NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

solve it by modprobe nvidia

then every thing is OK . my Gold.

Matrixsun avatar Jun 10 '17 06:06 Matrixsun

@Matrixsun I have the same issue, but when I do modprobe nvidia I received this error: "modprobe: ERROR: could not insert 'nvidia_375': No such device".

Would you mind elaborate more on how do you solve "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver"

my setting, Ubuntu 14.04, K80, in Azure

rizky avatar Jun 16 '17 12:06 rizky

@rizkyario sorry for my simple. Details as follows: Azure's Linux system lacks a lot of basic packages. My solution is one by one to install those packages I need before the installation of cuda. 1、install GCC yum install gcc*
2、install dkms wget http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-9.noarch.rpm rpm -ivh epel-release-7-9.noarch.rpm yum install --enablerepo=epel dkms 3、Install kernel related components yum install kernel* 4、install cuda 5、modprobe nvidia then it is OK .

Matrixsun avatar Jun 17 '17 06:06 Matrixsun

I am trying to run NVIDIA driver on AWS running with Ubuntu 16.04. I have installed kernel headers and successfully installed the NVIDIA-375.26 driver that comes along with official CUDA-8.0 release. However I keep getting "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running", no matter how many times I reinstall, run sudo update-initramfs -u and reboot. When I tried the method recommended by @Matrixsun, I got the following error: "modprobe: FATAL: Module nvidia not found in directory /lib/modules/4.4.0-1020-aws". What do I do to resolve this?

Laqshay avatar Jun 28 '17 06:06 Laqshay

I am trying to run NVIDIA driver on AWS running with Ubuntu 16.04

What instance type?

3DTOPO avatar Jun 28 '17 07:06 3DTOPO

p2.xlarge with NVIDIA Tesla K80

Laqshay avatar Jun 28 '17 09:06 Laqshay