simrdwn
simrdwn copied to clipboard
CUDA driver version is insufficient for CUDA runtime version
I cannot get simrdwn to train. It is telling me (via tensorflow) that my CUDA driver version is insufficient for CUDA runtime version. I know this seems like it is not a problem with this repository specifically but it seems like everything is configured properly on my end so I am at a loss to explain this behaviour.
I tried this using the default repository configuration, but I was receiving this very same error. I only have CUDA 9.1 becase I changed the first line of the Dockerfile from
nvidia/cuda:9.0-devel-ubuntu16.04
to
nvidia/cuda:9.1-devel-ubuntu16.04
This is the error I get:
Traceback (most recent call last):
File "/tensorflow/models/research/object_detection/model_main.py", line 109, in <module>
tf.app.run()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/tensorflow/models/research/object_detection/model_main.py", line 105, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 611, in run
return self.run_local()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 712, in run_local
saving_listeners=saving_listeners)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model_default
saving_listeners)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1403, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 508, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 934, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1122, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1127, in _create_session
return self._sess_creator.create_session()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 805, in create_session
self.tf_sess = self._session_creator.create_session()
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 571, in create_session
init_fn=self._scaffold.init_fn)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 281, in prepare_session
config=config)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 184, in _restore_checkpoint
sess = session.Session(self._target, graph=self._graph, config=config)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/opt/conda/envs/simrdwn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
This is the output of nvcc --version
(run from inside the container):
Cuda compilation tools, release 9.1, V9.1.85
(Again, I know that the Dockerfile specified v9.0, but I was getting the same error and that was why I tried bumping it up)
This is the output of nvidia-smi
(run from outside the container):
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116 Driver Version: 390.116 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 845M Off | 00000000:01:00.0 Off | N/A |
| N/A 52C P0 N/A / N/A | 167MiB / 2004MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1748 G /usr/lib/xorg/Xorg 166MiB |
+-----------------------------------------------------------------------------+
And according to the release notes, these should be compatible:
CUDA Toolkit | Linux x86_64 Driver Version
CUDA 9.1 (9.1.85) | >= 390.46
So since I have driver version 390.116 and CUDA Toolkit version 9.1, I can't explain why the container keeps throwing me this error
Do you have any idea?
You need to upgrade your graphics driver,my Driver Version: 418.39
What is your graphics card model?
What is your graphics card model?
p100