llm-foundry
llm-foundry copied to clipboard
Eval `gpt2` fails with `CUBLAS_STATUS_NOT_INITIALIZED`
Running
python eval/eval.py eval/yamls/hf_eval.yaml icl_tasks=eval/yamls/winograd.yaml model_name_or_path=gpt2
fails with
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
The same command runs fine with other models (e.g., EleutherAI/gpt-neo-125M
).
Any ideas what could be going wrong in the case of gpt2
?
Environment
System Environment Report
Created: 2023-06-22 15:38:44 UTC
PyTorch information
PyTorch version: 1.13.1+cu117 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.26.3 Libc version: glibc-2.31
Python version: 3.10.11 (main, Apr 5 2023, 14:15:10) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.19.0-1026-gcp-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: 11.7.99 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB GPU 1: NVIDIA A100-SXM4-40GB
Nvidia driver version: 520.61.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.5.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.5.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] numpy==1.24.2 [pip3] pytorch-ranger==0.1.1 [pip3] torch==1.13.1+cu117 [pip3] torch-optimizer==0.3.0 [pip3] torchmetrics==0.11.3 [pip3] torchtext==0.14.1 [pip3] torchvision==0.14.1+cu117 [conda] Could not collect
Composer information
Composer version: 0.14.1 Composer commit hash: None Host processor model name: Intel(R) Xeon(R) CPU @ 2.20GHz Host processor core count: 12 Number of nodes: 1 Accelerator model name: NVIDIA A100-SXM4-40GB Accelerators per node: 1 CUDA Device Count: 2