dolly icon indicating copy to clipboard operation
dolly copied to clipboard

Issue running on A100

Open mlucool opened this issue 2 years ago • 3 comments

Hi,

I tried following the directions to run this on an A100:

from transformers import pipeline
import torch

instruct_pipeline = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

instruct_pipeline("Explain to me the difference between nuclear fission and fusion.")

I see errors like:

../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [80,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
....
....
python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:232, in GPTNeoXAttention._attn(self, query, key, value, attention_mask, head_mask)
    229 if head_mask is not None:
    230     attn_weights = attn_weights * head_mask
--> 232 attn_output = torch.matmul(attn_weights, value)
    233 return attn_output, attn_weights

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`

Setup was:

pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"

And for CUDA I have:

 $ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Any advice on what to try to resolve this?

mlucool avatar Apr 25 '23 17:04 mlucool

Do you have cublas installed, and at a matching version for your CUDA drivers?

srowen avatar Apr 25 '23 21:04 srowen

Thanks for the quick reply.

I believe so:

$ cat /usr/local/cuda/include/cublas_api.h | grep _VER_
#define CUBLAS_VER_MAJOR 11
#define CUBLAS_VER_MINOR 11
#define CUBLAS_VER_PATCH 3
#define CUBLAS_VER_BUILD 6
#define CUBLAS_VERSION (CUBLAS_VER_MAJOR * 10000 + CUBLAS_VER_MINOR * 100 + CUBLAS_VER_PATCH)

mlucool avatar Apr 25 '23 22:04 mlucool

Not sure, it's running for me on CUDA 11.3 and 11.7, according to the code in the repo. I suspect it's something in the environment, but not sure what it is if you definitely have all the libraries installed at the same version as shown.

srowen avatar Apr 26 '23 14:04 srowen

Thanks for the quick reply.

I believe so:

$ cat /usr/local/cuda/include/cublas_api.h | grep _VER_
#define CUBLAS_VER_MAJOR 11
#define CUBLAS_VER_MINOR 11
#define CUBLAS_VER_PATCH 3
#define CUBLAS_VER_BUILD 6
#define CUBLAS_VERSION (CUBLAS_VER_MAJOR * 10000 + CUBLAS_VER_MINOR * 100 + CUBLAS_VER_PATCH)

how to solve?

kevinuserdd avatar May 27 '23 07:05 kevinuserdd