jax No kernel image is available for execution on the device

I have installed through pip3 install --upgrade jax jaxlib==0.1.61+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html nvcc --version shows

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

CUDA 11.1 is at /usr/local/cuda-11.1 and yet I am getting

RuntimeError: Unknown: no kernel image is available for execution on the device
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(44): 'cuLinkAddData( link_state, CU_JIT_INPUT_CUBIN, static_cast<void*>(image.bytes.data()), image.bytes.size(), "", 0, nullptr, nullptr)'

Output of nvidia-smi:

Tue Feb 16 21:26:58 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   53C    P8     6W /  N/A |    684MiB /  5944MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1700      G   /usr/lib/xorg/Xorg                106MiB |
|    0   N/A  N/A      9639      G   /usr/lib/xorg/Xorg                288MiB |
|    0   N/A  N/A      9833      G   /usr/bin/gnome-shell              136MiB |
|    0   N/A  N/A     10493      G   ...AAAAAAAAA= --shared-files        7MiB |
|    0   N/A  N/A     81098      G   ...gAAAAAAAAA --shared-files      135MiB |
+-----------------------------------------------------------------------------+

when trying the quickstart example.

Feb 13 '21 11:02 lee-van-oetz

Can you share the output of nvidia-smi as well? (This shows what GPU you have, etc.)

Feb 16 '21 20:02 hawkinsp

Can you share the output of nvidia-smi as well? (This shows what GPU you have, etc.)

Added now. Thanks for flagging.

Feb 16 '21 20:02 lee-van-oetz

I'm not sure why this happens, but I see the same thing if I have driver version 450.102.04 but CUDA version 11.1.

According to NVidia these two versions should work together (https://docs.nvidia.com/deploy/cuda-compatibility/index.html). I don't know why they aren't. I suggest either upgrading your driver or installing an older CUDA release.

Feb 16 '21 22:02 hawkinsp

Hello,

I am experiencing a similar problem, even with a newer version of jaxlib

python3 -m pip install --upgrade jax jaxlib==0.1.62+cuda112 -f https://storage.googleapis.com/jax-releases/jax_releases.html

In my case, nvcc --version shows:

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Jan_28_19:32:09_PST_2021 Cuda compilation tools, release 11.2, V11.2.142 Build cuda_11.2.r11.2/compiler.29558016_0

and nvidia-smi gives me this output:

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+`

The actual error that I am getting is the following:

Traceback (most recent call last): File "test.py", line 5, in <module> x = jnp.arange(10) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 3045, in arange return lax.iota(dtype, np.ceil(start)) # avoids materializing File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 1492, in iota return iota_p.bind(dtype=dtype, shape=(size,), dimension=0) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/core.py", line 284, in bind out = top_trace.process_primitive(self, tracers, params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/core.py", line 622, in process_primitive return primitive.impl(*tracers, **params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 241, in apply_primitive compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/util.py", line 198, in wrapper return cached(bool(config.x64_enabled), *args, **kwargs) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/util.py", line 191, in cached return f(*args, **kwargs) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 291, in xla_primitive_callable compiled = backend_compile(backend, built_c, options) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 355, in backend_compile return backend.compile(built_c, compile_options=options) RuntimeError: Unknown: no kernel image is available for execution on the device in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(44): 'cuLinkAddData( link_state, CU_JIT_INPUT_CUBIN, static_cast<void*>(image.bytes.data()), image.bytes.size(), "", 0, nullptr, nullptr)'

Note that I manually set the environmental variable to the CUDA path with export XLA_FLAGS=--xla_gpu_cuda_data_dir=/gpfs/fs1/soft/swing/manual/cuda/11.2.1

Thanks for helping

Mar 12 '21 20:03 alelovato

We've confirmed this issue is due to using too new a version of CUDA with too old a driver version. If you see this issue, the workaround is either to use an older CUDA release or a newer NVidia driver.

We may be able to work around at the JAX level also in the future.

May 11 '21 13:05 hawkinsp

I encountered this problem on an HPC of which I'm not an admin, and I set XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to suppress this error. I think it can be a problem with singularity, but just want to share my experience here.

Jul 14 '21 09:07 kngwyu

Setting the XLA flag works for me as well. But it comes with the warning

The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.

This sounds undesireable to not be multithreading. Can we get a more permanent solution?

Aug 16 '21 20:08 jasonkyuyim

I'm also seeing this issue. I'm on NixOS 20.09.3301.42809feaa9f, jaxlib 0.1.71, and here's my nvidia-smi:

[nix-shell:~/dev/nixpkgs]$ nvidia-smi 
Sat Sep  4 20:47:26 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P0    25W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Based on the NVIDIA docs, it seems like these two versions should be compatible.

Passing XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 does work, but I'd rather not get hit with that slowdown.

Sep 04 '21 20:09 samuela

I just upgraded to NixOS 21.05.2796.110a2c9ebbf to get driver version 470.57.02, and the issue has gone away.

Sep 05 '21 19:09 samuela

I am using Centos 7, and I was having the issue that is mentioned here, and the problem was solved after upgrading the nvidia driver from 460 to 470.74.

Oct 08 '21 12:10 A-Talavera

I'm having the same issue on an HPC using a singularity container. I'm not the admin so I can't update the nvidia driver. If a jax workaround that still allows multithreaded compilation is possible, that would be awesome!

Oct 13 '21 16:10 bantin

Getting the same error and the XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 doesn't work for me and introduces a new error.

jax version

packages in environment at /home/energy/amawi/miniconda3/envs/pansatz: Name Version Build Channel jax 0.2.24 pypi_0 pypi jaxlib 0.1.73+cuda11.cudnn82 pypi_0 pypi

nvcc and nvidia-smi

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Wed_Jun__2_19:15:15_PDT_2021 Cuda compilation tools, release 11.4, V11.4.48 Build cuda_11.4.r11.4/compiler.30033411_0 and

whereis nvcc nvcc: /usr/local/cuda-11.4/bin/nvcc.profile /usr/local/cuda-11.4/bin/nvcc with /usr/local/cuda-11.4 cuDNN v8.2.4

Before XLA flag

2021-10-25 13:59:17.326622: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc:63] cuLinkAddData fails. This is usually caused by stale driver version. 2021-10-25 13:59:17.326873: E external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1105] The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.

...

File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 24, in run_vmc keys = rnd.PRNGKey(cfg['seed']) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 122, in PRNGKey key = prng.seed_with_impl(impl, seed) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 203, in seed_with_impl return PRNGKeyArray(impl, impl.seed(seed)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 241, in threefry_seed k1 = convert(lax.shift_right_logical(seed_arr, lax._const(seed_arr, 32))) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 408, in shift_right_logical return shift_right_logical_p.bind(x, y) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 272, in bind out = top_trace.process_primitive(self, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 624, in process_primitive return primitive.impl(*tracers, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 312, in apply_primitive **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/util.py", line 187, in wrapper return cached(config._trace_context(), *args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/util.py", line 180, in cached return f(*args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 335, in xla_primitive_callable prim.name, donated_invars, *arg_specs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 654, in _xla_callable_uncached *arg_specs).compile().unsafe_call File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 770, in compile self.name, self.hlo(), *self.compile_args) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 798, in from_xla_computation compiled = compile_or_get_cached(backend, xla_computation, options) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 87, in compile_or_get_cached return backend_compile(backend, computation, compile_options) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 369, in backend_compile return backend.compile(built_c, compile_options=options) RuntimeError: UNKNOWN: no kernel image is available for execution on the device in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(66): 'status'

After including flag

File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 26, in run_vmc keys = rnd.split(keys, cfg['n_devices']).reshape(cfg['n_devices'], 2) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 191, in split return _return_prng_keys(wrapped, _split(key, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 177, in _split return key._split(num) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 191, in _split return PRNGKeyArray(self.impl, self.impl.split(self._keys, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 422, in threefry_split return _threefry_split(key, int(num)) # type: ignore File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/api.py", line 419, in cache_miss donated_invars=donated_invars, inline=inline) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1632, in bind return call_bind(self, fun, *args, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1623, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1635, in process return trace.process_call(self, fun, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 627, in process_call return primitive.impl(f, *tracers, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 584, in _xla_call_impl out = compiled_fun(*args) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 977, in _execute_compiled out_bufs = compiled.execute(input_bufs) jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: the provided PTX was compiled with an unsupported toolchain. The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "demo_vmc.py", line 21, in <module> log = run_vmc(cfg) File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 26, in run_vmc keys = rnd.split(keys, cfg['n_devices']).reshape(cfg['n_devices'], 2) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 191, in split return _return_prng_keys(wrapped, _split(key, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 177, in _split return key._split(num) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 191, in _split return PRNGKeyArray(self.impl, self.impl.split(self._keys, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 422, in threefry_split return _threefry_split(key, int(num)) # type: ignore File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 977, in _execute_compiled out_bufs = compiled.execute(input_bufs) RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: the provided PTX was compiled with an unsupported toolchain.

without XLA flag with TF_CPP_MIN_LOG_LOVEL=0 as issue #7118 returns the same

export XLA_PYTHON_CLIENT_PREALLOCATE=false without XLA flag (solution in #7118 also returns the same

Oct 25 '21 11:10 xmax1

I have the same problem on RTX 3090. jax 0.2.25 jaxlib 0.1.74+cuda11.cudnn82 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0 | NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |

Running with force compilation parallel =1 gives another error (could be related to #8506):

RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
...

Edit: I get everything working by using 'conda-forge' with this merge-pending jaxlib-gpu: https://github.com/conda-forge/jaxlib-feedstock/pull/72

Nov 30 '21 18:11 lkhphuc

我在RTX 3090上也有同样的问题。 jax 0.2.25 jaxlib 0.1.74+cuda11.cudnn82 Cuda编译工具，release 11.5，V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_00 | NVIDIA-SMI 470.74 驱动程序版本：470.74 CUDA 版本：11.4 |

使用 force compiler parallel =1 运行会产生另一个错误（可能与#8506相关）：
RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
...
编辑：我通过使用“conda-forge”和这个合并的挂起的jaxlib-gpu来让一切正常工作：conda-forge/jaxlib-feedstock#72

Hello, can you tell me specifically how to install this package? I currently download it locally through this link：https://anaconda.org/wolfv/jaxlib/files, and then try to install it with conda install --use-local jaxlib-0.1.73-cuda112py39h52c056e_0.tar.bz2 but it doesn't work. . .

Dec 20 '21 06:12 Machao-be-simple

@qinggeduoqing Install everything with conda-forge e.g conda install tensorflow jax -c conda-forge. Then install conda install jaxlib -c wolfv.

Dec 20 '21 14:12 lkhphuc

@lkhphuc Thank you so much for your reply, let me try this way...

Dec 21 '21 01:12 Machao-be-simple

I get error with the following command,

>>> rng, init_rng = jax.random.split(jax.random.PRNGKey(1))
2022-02-02 17:44:08.505863: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2089] Execution of replica 0 failed: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: no kernel image is available for execution on the device
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/random.py", line 188, in split
    return _return_prng_keys(wrapped, _split(key, num))
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/random.py", line 174, in _split
    return key._split(num)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/prng.py", line 187, in _split
    return PRNGKeyArray(self.impl, self.impl.split(self._keys, num))
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/prng.py", line 435, in threefry_split
    return _threefry_split(key, int(num))  # type: ignore
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/traceback_util.py", line 165, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/api.py", line 430, in cache_miss
    out_flat = xla.xla_call(
  File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 1681, in bind
    return call_bind(self, fun, *args, **params)
  File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 1693, in call_bind
    outs = top_trace.process_call(primitive, fun, tracers, params)
  File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 594, in process_call
    return primitive.impl(f, *tracers, **params)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/dispatch.py", line 145, in _xla_call_impl
    out = compiled_fun(*args)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/dispatch.py", line 444, in _execute_compiled
    out_bufs = compiled.execute(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: no kernel image is available for execution on the device

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

Jax can see the gpus though,

>>> jax.devices()
[GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0)]

I installed jaxlib 1.77.0 and jax 0.2.28 from source. I am using cuda 11.5 and cudnn 8.3.0. I made sure that the PATH env variable is setup properly and that the python session is loading the currect cuda libraries. I'm not sure what else can be wrong. I'm running the program on an Ubuntu 20.04 with 2 GTX 1080s.

Feb 03 '22 01:02 mshafiei

@mshafiei Can you share the output of nvidia-smi?

Feb 03 '22 02:02 hawkinsp

@hawkinsp sure,

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 27%   34C    P8    10W / 180W |   7328MiB /  8116MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 28%   31C    P8     6W / 180W |    212MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    214915      C   python                            107MiB |
|    0   N/A  N/A    266904      C   python                           7219MiB |
|    1   N/A  N/A    214915      C   python                            105MiB |
|    1   N/A  N/A    266904      C   python                            105MiB |

Feb 03 '22 02:02 mshafiei

Hmm. Interesting. I would have expected that to work. My best suggestion for something you to try to get unblocked would be to build jaxlib from source, explicitly opting in for the CUDA capability for your device. (https://jax.readthedocs.io/en/latest/developer.html). There's an option to specify a list of CUDA compute capabilities to the build.py via a flag (try --help).

Feb 03 '22 02:02 hawkinsp

@hawkinsp I am actually building jaxlib from source and passing the cuda specifications as below,

python ./build/build.py  \
  --enable_cuda \
  --cuda_path='/usr/local/cuda-11.5' \
  --cudnn_path='/usr/local/cuda-11.5' \
  --cuda_version='11.5' \
  --cudnn_version='8.3.0' \
  --cuda_compute_capabilities 8.0
  --noenable_mkl_dnn

Are these flags what you were referring to?

Feb 03 '22 02:02 mshafiei

In my case, making sure that nvcc --version and nvidia-smi were both at the same version (11.4 in my case) fixed the problem.

When nvcc --version was at 11.7:

Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

and nvidia-smi was showing cuda version 11.4:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+

I was getting the error above, and I could only work around it by setting export XLA_FLAGS --xla_gpu_force_compilation_parallelism=1. Note however that this disables parallelism in compiling your model, so if you have a big model it comes with a significant cost. In my case, parallel compilation reduced the xla compilation time from ~4.5 minutes to ~1.5 minutes.

May 20 '22 11:05 danieldanciu

@lee-van-oetz is this fixed now?

Aug 12 '22 19:08 sudhakarsingh27

I'm pretty sure this is fixed in recent jaxlib releases. We added code to jaxlib that falls back to not using cuLink... if the driver version is too old.

Dec 06 '22 16:12 hawkinsp

Hi, I am also facing this problem.

I use the following :

Windows 10 , 19044(21H2)
Visual Studio 2019
Nvidia GeForce GTX 960m , Maxwell , Capability 5.0
Cuda Toolkit 11.7 or 11.8 or 12.0
I want to write simple Cuda C++ code.

After I installed all the packages related to C++ on VS2019, I installed Cuda Toolkit. When I run sample code , I get the following error :

No kernel image is available for execution on the device.

I even ran Cuda 11.7, 11.8 and 12.0 on VS2019 and VS2022, But the error still exists. I am facing this error for 20 days and I am really fed up. also deviceQuery.exe and nvidia-smi and nvcc --version runs fine. I have also checked all the nvidia and other sites and my GPU has no problem with the Cuda version. What factors can cause this error ? What could be the reasons for this error?

please help me out , thanks all.

Feb 11 '23 09:02 hosein-cnn

Anyone else still struggling with this? I've tried dozens of combinations of nvidia drivers, ubuntu's nvidia-cuda-toolkit, installing different versions of cuda from nvidia's website, jaxlib and jax, nothing has solved the problem.

I'm using Ubuntu 23.04 GeForce 940MX

nvidia-smi Driver Version: 525.147.05 CUDA Version: 12.0

nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

jaxlib==0.4.20+cuda11.cudnn86 jax==0.4.20

print(jax.devices()) shows the GPU [cuda(id=0)]

But trying to use it results in XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.custom_call' failed: jaxlib/gpu/prng_kernels.cc:33: operation gpuGetLastError() failed: no kernel image is available for execution on the device; current tracing scope: custom-call.1; current profiling annotation: XlaModule:#hlo_module=jit__normal,program_id=2#.

FWIW, I had jax working fine just a few days ago, might be caused by a recent update to 23.04.

Nov 22 '23 16:11 deoxyribose

@deoxyribose I think the problem is that we are building for GPUs with SM version 5.2 at a minimum: https://github.com/google/jax/blob/961ba3cd4290e4f30573a12a5f7ae3db26856320/.bazelrc#L68

but your GPU appears to have SM version 5.0.

The fix is to build jaxlib yourself, explicitly specifying your SM version. Try:

python build/build.py --enable_cuda --cuda_compute_capabilities=sm_50

We've actually never shipped support for that model of GPU.

Nov 22 '23 16:11 hawkinsp

@deoxyribose #18644 will add sm_50 support to the next jaxlib release.

Nov 22 '23 16:11 hawkinsp

@hawkinsp Thanks for the quick reply! When I try to build, I get ERROR: @xla//xla/python:enable_gpu :: Error loading option @xla//xla/python:enable_gpu: no such package '@local_config_cuda//cuda': Repository command failed Inconsistent CUDA toolkit path: /usr vs /usr/lib

I tried removing and installing cuda with sudo apt install nvidia-cuda-toolkit

I'm not sure if this is the expected location:

$ whereis cuda
cuda: /usr/lib/cuda /usr/include/cuda

$ which nvcc
/usr/bin/nvcc

$ whereis cuda.h
cuda.h: /usr/include/cuda.h

I tried with python build/build.py --enable_cuda --cuda_compute_capabilities=sm_50 --cuda_path=/usr/ but no difference - any tips?

Nov 22 '23 17:11 deoxyribose

jax jax copied to clipboard

No kernel image is available for execution on the device

jax
jax copied to clipboard