dolly Issue running on A100

Hi,

I tried following the directions to run this on an A100:

from transformers import pipeline
import torch

instruct_pipeline = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

instruct_pipeline("Explain to me the difference between nuclear fission and fusion.")

I see errors like:

../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [80,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
....
....
python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:232, in GPTNeoXAttention._attn(self, query, key, value, attention_mask, head_mask)
    229 if head_mask is not None:
    230     attn_weights = attn_weights * head_mask
--> 232 attn_output = torch.matmul(attn_weights, value)
    233 return attn_output, attn_weights

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`

Setup was:

pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"

And for CUDA I have:

 $ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Any advice on what to try to resolve this?

Apr 25 '23 17:04 mlucool

Do you have cublas installed, and at a matching version for your CUDA drivers?

Apr 25 '23 21:04 srowen

Thanks for the quick reply.

I believe so:

$ cat /usr/local/cuda/include/cublas_api.h | grep _VER_
#define CUBLAS_VER_MAJOR 11
#define CUBLAS_VER_MINOR 11
#define CUBLAS_VER_PATCH 3
#define CUBLAS_VER_BUILD 6
#define CUBLAS_VERSION (CUBLAS_VER_MAJOR * 10000 + CUBLAS_VER_MINOR * 100 + CUBLAS_VER_PATCH)

Apr 25 '23 22:04 mlucool

Not sure, it's running for me on CUDA 11.3 and 11.7, according to the code in the repo. I suspect it's something in the environment, but not sure what it is if you definitely have all the libraries installed at the same version as shown.

Apr 26 '23 14:04 srowen

Thanks for the quick reply.

I believe so:

$ cat /usr/local/cuda/include/cublas_api.h | grep _VER_
#define CUBLAS_VER_MAJOR 11
#define CUBLAS_VER_MINOR 11
#define CUBLAS_VER_PATCH 3
#define CUBLAS_VER_BUILD 6
#define CUBLAS_VERSION (CUBLAS_VER_MAJOR * 10000 + CUBLAS_VER_MINOR * 100 + CUBLAS_VER_PATCH)

how to solve?

May 27 '23 07:05 kevinuserdd

dolly dolly copied to clipboard

Issue running on A100

dolly
dolly copied to clipboard