PaLM
PaLM copied to clipboard
RuntimeError: No available kernel. Aborting execution.
When I run the inference logic using the following script, I get RuntimeError: No available kernel. Aborting execution.
error:
A100 GPU detected, using flash attention if input tensor is on cuda
0%| | 0/251 [00:00<?, ?it/s]/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:659.)
out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:450.)
out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:661.)
out = F.scaled_dot_product_attention(
/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py:100: UserWarning: Expected query, key and value to all be of dtype: {Half, BFloat16}. Got Query dtype: float, Key dtype: float, and Value dtype: float instead. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:100.)
out = F.scaled_dot_product_attention(
0%| | 0/251 [00:00<?, ?it/s]
Traceback (most recent call last):
... <truncated>
File "/home/azureuser/PaLM/.venv/lib/python3.8/site-packages/palm_rlhf_pytorch/attention.py", line 100, in flash_attn
out = F.scaled_dot_product_attention(
RuntimeError: No available kernel. Aborting execution.
I tried installing the Pytorch nightly version and that did not help:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
NVIDIA driver version:
/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
PyTorch version:
pip3 show torch
Name: torch
Version: 2.1.0.dev20230618+cu121
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /home/azureuser/PaLM/.venv/lib/python3.8/site-packages
Requires: filelock, pytorch-triton, sympy, networkx, jinja2, fsspec, typing-extensions
Required-by: torchvision, torchaudio, PaLM-rlhf-pytorch, lion-pytorch, accelerate
Any idea what could cause this?