jax
jax copied to clipboard
No kernel image is available for execution on the device
I have installed through
pip3 install --upgrade jax jaxlib==0.1.61+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
nvcc --version shows
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
CUDA 11.1 is at /usr/local/cuda-11.1
and yet I am getting
RuntimeError: Unknown: no kernel image is available for execution on the device
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(44): 'cuLinkAddData( link_state, CU_JIT_INPUT_CUBIN, static_cast<void*>(image.bytes.data()), image.bytes.size(), "", 0, nullptr, nullptr)'
Output of nvidia-smi:
Tue Feb 16 21:26:58 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 166... Off | 00000000:01:00.0 On | N/A |
| N/A 53C P8 6W / N/A | 684MiB / 5944MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1700 G /usr/lib/xorg/Xorg 106MiB |
| 0 N/A N/A 9639 G /usr/lib/xorg/Xorg 288MiB |
| 0 N/A N/A 9833 G /usr/bin/gnome-shell 136MiB |
| 0 N/A N/A 10493 G ...AAAAAAAAA= --shared-files 7MiB |
| 0 N/A N/A 81098 G ...gAAAAAAAAA --shared-files 135MiB |
+-----------------------------------------------------------------------------+
when trying the quickstart example.
Can you share the output of nvidia-smi as well? (This shows what GPU you have, etc.)
Can you share the output of
nvidia-smias well? (This shows what GPU you have, etc.)
Added now. Thanks for flagging.
I'm not sure why this happens, but I see the same thing if I have driver version 450.102.04 but CUDA version 11.1.
According to NVidia these two versions should work together (https://docs.nvidia.com/deploy/cuda-compatibility/index.html). I don't know why they aren't. I suggest either upgrading your driver or installing an older CUDA release.
Hello,
I am experiencing a similar problem, even with a newer version of jaxlib
python3 -m pip install --upgrade jax jaxlib==0.1.62+cuda112 -f https://storage.googleapis.com/jax-releases/jax_releases.html
In my case, nvcc --version shows:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Jan_28_19:32:09_PST_2021 Cuda compilation tools, release 11.2, V11.2.142 Build cuda_11.2.r11.2/compiler.29558016_0
and nvidia-smi gives me this output:
`Fri Mar 12 13:54:27 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB On | 00000000:07:00.0 Off | 0 |
| N/A 23C P0 52W / 400W | 0MiB / 40537MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+`
The actual error that I am getting is the following:
Traceback (most recent call last): File "test.py", line 5, in <module> x = jnp.arange(10) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 3045, in arange return lax.iota(dtype, np.ceil(start)) # avoids materializing File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 1492, in iota return iota_p.bind(dtype=dtype, shape=(size,), dimension=0) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/core.py", line 284, in bind out = top_trace.process_primitive(self, tracers, params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/core.py", line 622, in process_primitive return primitive.impl(*tracers, **params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 241, in apply_primitive compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/util.py", line 198, in wrapper return cached(bool(config.x64_enabled), *args, **kwargs) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/util.py", line 191, in cached return f(*args, **kwargs) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 291, in xla_primitive_callable compiled = backend_compile(backend, built_c, options) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 355, in backend_compile return backend.compile(built_c, compile_options=options) RuntimeError: Unknown: no kernel image is available for execution on the device in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(44): 'cuLinkAddData( link_state, CU_JIT_INPUT_CUBIN, static_cast<void*>(image.bytes.data()), image.bytes.size(), "", 0, nullptr, nullptr)'
Note that I manually set the environmental variable to the CUDA path with
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/gpfs/fs1/soft/swing/manual/cuda/11.2.1
Thanks for helping
We've confirmed this issue is due to using too new a version of CUDA with too old a driver version. If you see this issue, the workaround is either to use an older CUDA release or a newer NVidia driver.
We may be able to work around at the JAX level also in the future.
I encountered this problem on an HPC of which I'm not an admin, and I set XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to suppress this error.
I think it can be a problem with singularity, but just want to share my experience here.
Setting the XLA flag works for me as well. But it comes with the warning
The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.
This sounds undesireable to not be multithreading. Can we get a more permanent solution?
I'm also seeing this issue. I'm on NixOS 20.09.3301.42809feaa9f, jaxlib 0.1.71, and here's my nvidia-smi:
[nix-shell:~/dev/nixpkgs]$ nvidia-smi
Sat Sep 4 20:47:26 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38 Driver Version: 455.38 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 34C P0 25W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Based on the NVIDIA docs, it seems like these two versions should be compatible.
Passing XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 does work, but I'd rather not get hit with that slowdown.
I just upgraded to NixOS 21.05.2796.110a2c9ebbf to get driver version 470.57.02, and the issue has gone away.
I am using Centos 7, and I was having the issue that is mentioned here, and the problem was solved after upgrading the nvidia driver from 460 to 470.74.
I'm having the same issue on an HPC using a singularity container. I'm not the admin so I can't update the nvidia driver. If a jax workaround that still allows multithreaded compilation is possible, that would be awesome!
Getting the same error and the XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 doesn't work for me and introduces a new error.
jax version
packages in environment at /home/energy/amawi/miniconda3/envs/pansatz: Name Version Build Channel jax 0.2.24 pypi_0 pypi jaxlib 0.1.73+cuda11.cudnn82 pypi_0 pypi
nvcc and nvidia-smi
nvidia-smi Mon Oct 25 13:38:50 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 465.31 Driver Version: 465.31 CUDA Version: 11.3 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:1A:00.0 Off | N/A | | 30% 26C P8 20W / 350W | 1MiB / 24268MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Wed_Jun__2_19:15:15_PDT_2021 Cuda compilation tools, release 11.4, V11.4.48 Build cuda_11.4.r11.4/compiler.30033411_0
and
whereis nvcc nvcc: /usr/local/cuda-11.4/bin/nvcc.profile /usr/local/cuda-11.4/bin/nvcc
with
/usr/local/cuda-11.4
cuDNN v8.2.4
Before XLA flag
2021-10-25 13:59:17.326622: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc:63] cuLinkAddData fails. This is usually caused by stale driver version. 2021-10-25 13:59:17.326873: E external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1105] The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.
...
File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 24, in run_vmc keys = rnd.PRNGKey(cfg['seed']) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 122, in PRNGKey key = prng.seed_with_impl(impl, seed) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 203, in seed_with_impl return PRNGKeyArray(impl, impl.seed(seed)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 241, in threefry_seed k1 = convert(lax.shift_right_logical(seed_arr, lax._const(seed_arr, 32))) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 408, in shift_right_logical return shift_right_logical_p.bind(x, y) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 272, in bind out = top_trace.process_primitive(self, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 624, in process_primitive return primitive.impl(*tracers, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 312, in apply_primitive **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/util.py", line 187, in wrapper return cached(config._trace_context(), *args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/util.py", line 180, in cached return f(*args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 335, in xla_primitive_callable prim.name, donated_invars, *arg_specs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 654, in _xla_callable_uncached *arg_specs).compile().unsafe_call File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 770, in compile self.name, self.hlo(), *self.compile_args) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 798, in from_xla_computation compiled = compile_or_get_cached(backend, xla_computation, options) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 87, in compile_or_get_cached return backend_compile(backend, computation, compile_options) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 369, in backend_compile return backend.compile(built_c, compile_options=options) RuntimeError: UNKNOWN: no kernel image is available for execution on the device in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(66): 'status'
After including flag
File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 26, in run_vmc keys = rnd.split(keys, cfg['n_devices']).reshape(cfg['n_devices'], 2) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 191, in split return _return_prng_keys(wrapped, _split(key, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 177, in _split return key._split(num) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 191, in _split return PRNGKeyArray(self.impl, self.impl.split(self._keys, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 422, in threefry_split return _threefry_split(key, int(num)) # type: ignore File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/api.py", line 419, in cache_miss donated_invars=donated_invars, inline=inline) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1632, in bind return call_bind(self, fun, *args, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1623, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1635, in process return trace.process_call(self, fun, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 627, in process_call return primitive.impl(f, *tracers, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 584, in _xla_call_impl out = compiled_fun(*args) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 977, in _execute_compiled out_bufs = compiled.execute(input_bufs) jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: the provided PTX was compiled with an unsupported toolchain. The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception: Traceback (most recent call last): File "demo_vmc.py", line 21, in <module> log = run_vmc(cfg) File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 26, in run_vmc keys = rnd.split(keys, cfg['n_devices']).reshape(cfg['n_devices'], 2) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 191, in split return _return_prng_keys(wrapped, _split(key, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 177, in _split return key._split(num) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 191, in _split return PRNGKeyArray(self.impl, self.impl.split(self._keys, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 422, in threefry_split return _threefry_split(key, int(num)) # type: ignore File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 977, in _execute_compiled out_bufs = compiled.execute(input_bufs) RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: the provided PTX was compiled with an unsupported toolchain.
without XLA flag with TF_CPP_MIN_LOG_LOVEL=0 as issue #7118 returns the same
export XLA_PYTHON_CLIENT_PREALLOCATE=false without XLA flag (solution in #7118 also returns the same
I have the same problem on RTX 3090. jax 0.2.25 jaxlib 0.1.74+cuda11.cudnn82 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0 | NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |
Running with force compilation parallel =1 gives another error (could be related to #8506):
RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
...
Edit: I get everything working by using 'conda-forge' with this merge-pending jaxlib-gpu: https://github.com/conda-forge/jaxlib-feedstock/pull/72
我在RTX 3090上也有同样的问题。 jax 0.2.25 jaxlib 0.1.74+cuda11.cudnn82 Cuda编译工具,release 11.5,V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_00 | NVIDIA-SMI 470.74 驱动程序版本:470.74 CUDA 版本:11.4 |
使用 force compiler parallel =1 运行会产生另一个错误(可能与#8506相关):
RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for: ...编辑: 我通过使用“conda-forge”和这个合并的挂起的jaxlib-gpu来让一切正常工作:conda-forge/jaxlib-feedstock#72
Hello, can you tell me specifically how to install this package? I currently download it locally through this link:https://anaconda.org/wolfv/jaxlib/files, and then try to install it with conda install --use-local jaxlib-0.1.73-cuda112py39h52c056e_0.tar.bz2 but it doesn't work. . .
@qinggeduoqing Install everything with conda-forge e.g conda install tensorflow jax -c conda-forge. Then install conda install jaxlib -c wolfv.
@lkhphuc Thank you so much for your reply, let me try this way...
I get error with the following command,
>>> rng, init_rng = jax.random.split(jax.random.PRNGKey(1))
2022-02-02 17:44:08.505863: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2089] Execution of replica 0 failed: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: no kernel image is available for execution on the device
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mohammad/Projects/optimizer/jax/jax/_src/random.py", line 188, in split
return _return_prng_keys(wrapped, _split(key, num))
File "/home/mohammad/Projects/optimizer/jax/jax/_src/random.py", line 174, in _split
return key._split(num)
File "/home/mohammad/Projects/optimizer/jax/jax/_src/prng.py", line 187, in _split
return PRNGKeyArray(self.impl, self.impl.split(self._keys, num))
File "/home/mohammad/Projects/optimizer/jax/jax/_src/prng.py", line 435, in threefry_split
return _threefry_split(key, int(num)) # type: ignore
File "/home/mohammad/Projects/optimizer/jax/jax/_src/traceback_util.py", line 165, in reraise_with_filtered_traceback
return fun(*args, **kwargs)
File "/home/mohammad/Projects/optimizer/jax/jax/_src/api.py", line 430, in cache_miss
out_flat = xla.xla_call(
File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 1681, in bind
return call_bind(self, fun, *args, **params)
File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 1693, in call_bind
outs = top_trace.process_call(primitive, fun, tracers, params)
File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 594, in process_call
return primitive.impl(f, *tracers, **params)
File "/home/mohammad/Projects/optimizer/jax/jax/_src/dispatch.py", line 145, in _xla_call_impl
out = compiled_fun(*args)
File "/home/mohammad/Projects/optimizer/jax/jax/_src/dispatch.py", line 444, in _execute_compiled
out_bufs = compiled.execute(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: no kernel image is available for execution on the device
The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.
Jax can see the gpus though,
>>> jax.devices()
[GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0)]
I installed jaxlib 1.77.0 and jax 0.2.28 from source. I am using cuda 11.5 and cudnn 8.3.0. I made sure that the PATH env variable is setup properly and that the python session is loading the currect cuda libraries. I'm not sure what else can be wrong. I'm running the program on an Ubuntu 20.04 with 2 GTX 1080s.
@mshafiei Can you share the output of nvidia-smi?
@hawkinsp sure,
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A |
| 27% 34C P8 10W / 180W | 7328MiB / 8116MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:04:00.0 Off | N/A |
| 28% 31C P8 6W / 180W | 212MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 214915 C python 107MiB |
| 0 N/A N/A 266904 C python 7219MiB |
| 1 N/A N/A 214915 C python 105MiB |
| 1 N/A N/A 266904 C python 105MiB |
Hmm. Interesting. I would have expected that to work. My best suggestion for something you to try to get unblocked would be to build jaxlib from source, explicitly opting in for the CUDA capability for your device. (https://jax.readthedocs.io/en/latest/developer.html). There's an option to specify a list of CUDA compute capabilities to the build.py via a flag (try --help).
@hawkinsp I am actually building jaxlib from source and passing the cuda specifications as below,
python ./build/build.py \
--enable_cuda \
--cuda_path='/usr/local/cuda-11.5' \
--cudnn_path='/usr/local/cuda-11.5' \
--cuda_version='11.5' \
--cudnn_version='8.3.0' \
--cuda_compute_capabilities 8.0
--noenable_mkl_dnn
Are these flags what you were referring to?
In my case, making sure that nvcc --version and nvidia-smi were both at the same version (11.4 in my case) fixed the problem.
When nvcc --version was at 11.7:
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
and nvidia-smi was showing cuda version 11.4:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
I was getting the error above, and I could only work around it by setting
export XLA_FLAGS --xla_gpu_force_compilation_parallelism=1. Note however that this disables parallelism in compiling your model, so if you have a big model it comes with a significant cost. In my case, parallel compilation reduced the xla compilation time from ~4.5 minutes to ~1.5 minutes.
@lee-van-oetz is this fixed now?
I'm pretty sure this is fixed in recent jaxlib releases. We added code to jaxlib that falls back to not using cuLink... if the driver version is too old.
Hi, I am also facing this problem.
I use the following :
- Windows 10 , 19044(21H2)
- Visual Studio 2019
- Nvidia GeForce GTX 960m , Maxwell , Capability 5.0
- Cuda Toolkit 11.7 or 11.8 or 12.0
- I want to write simple Cuda C++ code.
After I installed all the packages related to C++ on VS2019, I installed Cuda Toolkit. When I run sample code , I get the following error :
- No kernel image is available for execution on the device.
I even ran Cuda 11.7, 11.8 and 12.0 on VS2019 and VS2022, But the error still exists. I am facing this error for 20 days and I am really fed up. also deviceQuery.exe and nvidia-smi and nvcc --version runs fine. I have also checked all the nvidia and other sites and my GPU has no problem with the Cuda version. What factors can cause this error ? What could be the reasons for this error?
please help me out , thanks all.
Anyone else still struggling with this? I've tried dozens of combinations of nvidia drivers, ubuntu's nvidia-cuda-toolkit, installing different versions of cuda from nvidia's website, jaxlib and jax, nothing has solved the problem.
I'm using Ubuntu 23.04 GeForce 940MX
nvidia-smi Driver Version: 525.147.05 CUDA Version: 12.0
nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
jaxlib==0.4.20+cuda11.cudnn86 jax==0.4.20
print(jax.devices()) shows the GPU
[cuda(id=0)]
But trying to use it results in
XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.custom_call' failed: jaxlib/gpu/prng_kernels.cc:33: operation gpuGetLastError() failed: no kernel image is available for execution on the device; current tracing scope: custom-call.1; current profiling annotation: XlaModule:#hlo_module=jit__normal,program_id=2#.
FWIW, I had jax working fine just a few days ago, might be caused by a recent update to 23.04.
@deoxyribose I think the problem is that we are building for GPUs with SM version 5.2 at a minimum: https://github.com/google/jax/blob/961ba3cd4290e4f30573a12a5f7ae3db26856320/.bazelrc#L68
but your GPU appears to have SM version 5.0.
The fix is to build jaxlib yourself, explicitly specifying your SM version. Try:
python build/build.py --enable_cuda --cuda_compute_capabilities=sm_50
We've actually never shipped support for that model of GPU.
@deoxyribose #18644 will add sm_50 support to the next jaxlib release.
@hawkinsp Thanks for the quick reply!
When I try to build, I get
ERROR: @xla//xla/python:enable_gpu :: Error loading option @xla//xla/python:enable_gpu: no such package '@local_config_cuda//cuda': Repository command failed Inconsistent CUDA toolkit path: /usr vs /usr/lib
I tried removing and installing cuda with
sudo apt install nvidia-cuda-toolkit
I'm not sure if this is the expected location:
$ whereis cuda
cuda: /usr/lib/cuda /usr/include/cuda
$ which nvcc
/usr/bin/nvcc
$ whereis cuda.h
cuda.h: /usr/include/cuda.h
I tried with
python build/build.py --enable_cuda --cuda_compute_capabilities=sm_50 --cuda_path=/usr/
but no difference - any tips?