dolly
dolly copied to clipboard
Issue running on A100
Hi,
I tried following the directions to run this on an A100:
from transformers import pipeline
import torch
instruct_pipeline = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
instruct_pipeline("Explain to me the difference between nuclear fission and fusion.")
I see errors like:
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [80,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
....
....
python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:232, in GPTNeoXAttention._attn(self, query, key, value, attention_mask, head_mask)
229 if head_mask is not None:
230 attn_weights = attn_weights * head_mask
--> 232 attn_output = torch.matmul(attn_weights, value)
233 return attn_output, attn_weights
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Setup was:
pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"
And for CUDA I have:
$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Any advice on what to try to resolve this?
Do you have cublas installed, and at a matching version for your CUDA drivers?
Thanks for the quick reply.
I believe so:
$ cat /usr/local/cuda/include/cublas_api.h | grep _VER_
#define CUBLAS_VER_MAJOR 11
#define CUBLAS_VER_MINOR 11
#define CUBLAS_VER_PATCH 3
#define CUBLAS_VER_BUILD 6
#define CUBLAS_VERSION (CUBLAS_VER_MAJOR * 10000 + CUBLAS_VER_MINOR * 100 + CUBLAS_VER_PATCH)
Not sure, it's running for me on CUDA 11.3 and 11.7, according to the code in the repo. I suspect it's something in the environment, but not sure what it is if you definitely have all the libraries installed at the same version as shown.
Thanks for the quick reply.
I believe so:
$ cat /usr/local/cuda/include/cublas_api.h | grep _VER_ #define CUBLAS_VER_MAJOR 11 #define CUBLAS_VER_MINOR 11 #define CUBLAS_VER_PATCH 3 #define CUBLAS_VER_BUILD 6 #define CUBLAS_VERSION (CUBLAS_VER_MAJOR * 10000 + CUBLAS_VER_MINOR * 100 + CUBLAS_VER_PATCH)
how to solve?