ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[BUG/Help] <title> RuntimeError: Library cudart is not initialized

Open rogerrojur opened this issue 2 years ago • 27 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00, 1.14s/it] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /data/text2music/ChatGLM-6B/cli_demo1.py:5 in │ │ │ │ 2 from transformers import AutoTokenizer, AutoModel │ │ 3 │ │ 4 tokenizer = AutoTokenizer.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_rem │ │ ❱ 5 model = AutoModel.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code │ │ 6 model = model.eval() │ │ 7 │ │ 8 history = [] │ │ │ │ /home/user_00/.cache/huggingface/modules/transformers_modules/local/modeling_chatglm.py:1154 in │ │ quantize │ │ │ │ 1151 │ │ │ 1152 │ def quantize(self, bits: int): │ │ 1153 │ │ from .quantization import quantize │ │ ❱ 1154 │ │ self.transformer = quantize(self.transformer, bits) │ │ 1155 │ │ return self │ │ 1156 │ │ │ │ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:147 in │ │ quantize │ │ │ │ 144 │ """Replace fp16 linear with quantized linear""" │ │ 145 │ │ │ 146 │ for layer in model.layers: │ │ ❱ 147 │ │ layer.attention.query_key_value = QuantizedLinear( │ │ 148 │ │ │ weight_bit_width=weight_bit_width, │ │ 149 │ │ │ weight_tensor=layer.attention.query_key_value.weight.to(torch.cuda.current_d │ │ 150 │ │ │ bias_tensor=layer.attention.query_key_value.bias, │ │ │ │ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:130 in │ │ init │ │ │ │ 127 │ │ │ self.weight_scale = (weight_tensor.abs().max(dim=-1).values / ((2 ** (weight │ │ 128 │ │ │ self.weight = torch.round(weight_tensor / self.weight_scale[:, None]).to(tor │ │ 129 │ │ │ if weight_bit_width == 4: │ │ ❱ 130 │ │ │ │ self.weight = compress_int4_weight(self.weight) │ │ 131 │ │ │ │ 132 │ │ self.weight = Parameter(self.weight.to(kwargs["device"]), requires_grad=False) │ │ 133 │ │ self.weight_scale = Parameter(self.weight_scale.to(kwargs["device"]), requires_g │ │ │ │ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:71 in │ │ compress_int4_weight │ │ │ │ 68 │ │ gridDim = (n, 1, 1) │ │ 69 │ │ blockDim = (min(round_up(m, 32), 1024), 1, 1) │ │ 70 │ │ │ │ ❱ 71 │ │ kernels.int4WeightCompression( │ │ 72 │ │ │ gridDim, │ │ 73 │ │ │ blockDim, │ │ 74 │ │ │ 0, │ │ │ │ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/kernels/base.py:48 in call │ │ │ │ 45 │ │ │ sharedMemBytes : int, stream : cudart.cudaStream_t, params : List[Any] ) -> │ │ 46 │ │ assert len(gridDim) == 3 │ │ 47 │ │ assert len(blockDim) == 3 │ │ ❱ 48 │ │ func = self._prepare_func() │ │ 49 │ │ │ │ 50 │ │ cuda.cuLaunchKernel(func, │ │ 51 │ │ │ gridDim[0], gridDim[1], gridDim[2], │ │ │ │ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/kernels/base.py:36 in │ │ _prepare_func │ │ │ │ 33 │ │ self._func_name = func_name │ │ 34 │ │ │ 35 │ def _prepare_func(self): │ │ ❱ 36 │ │ curr_device = cudart.cudaGetDevice() │ │ 37 │ │ cudart.cudaSetDevice(curr_device) # ensure cudart context │ │ 38 │ │ if curr_device not in self._funcs: │ │ 39 │ │ │ self._funcs[curr_device] = cuda.cuModuleGetFunction( │ │ │ │ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/library/base.py:72 in wrapper │ │ │ │ 69 │ │ │ def decorator(f): │ │ 70 │ │ │ │ @wraps(f) │ │ 71 │ │ │ │ def wrapper(*args, **kwargs): │ │ ❱ 72 │ │ │ │ │ raise RuntimeError("Library %s is not initialized" % self.__name) │ │ 73 │ │ │ │ return wrapper │ │ 74 │ │ │ return decorator │ │ 75 │ │ else: │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Library cudart is not initialized

Expected Behavior

I just use the quantize function, to convert the model into int4. However, this exception appear. How could I fix this bug to successfully quantize this ChatGLM-6B?

Steps To Reproduce

import os from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code=True) model = AutoModel.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code=True).half().quantize(4).cuda(device=2)

Environment

- OS: Ubuntu 20.04
- Python: 3.7
- Transformers: 4.26.1
- PyTorch: 1.13
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

No response

rogerrojur avatar Mar 17 '23 02:03 rogerrojur

same problem. have you solved this?

Adenialzz avatar Mar 17 '23 03:03 Adenialzz

检查本机cuda的安装是否正确,或者尝试添加下path到cuda的bin目录 我重装了cuda,设置了path后,问题解决,正常运行

188080501 avatar Mar 17 '23 10:03 188080501

添加下path到cuda的bin目录,请问是什么path,项目path吗?

Chenny0808 avatar Mar 18 '23 05:03 Chenny0808

同样的问题

mh739025250 avatar Mar 23 '23 04:03 mh739025250

  1. 首先,在环境里找到torch库内nvrtc开头的一个链接库文件,比如我的是在windows平台、miniconda的环境里的C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll路径。不同平台应该都有所不同。
  2. 把这个文件所在目录加到PATH里。如果不希望污染操作系统的PATH,可以直接在开头import os之后直接加进去,例如: os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. 然后打开应该就可以了。

English version(Translated by ChatGPT):

  1. First, find a library file starting with "nvrtc" in the torch library in your environment. For example, mine is located at the path C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll in a Windows platform with miniconda installed. The path may differ for different platforms.
  2. Add the directory where the file is located to your PATH. If you don't want to modify the PATH of your operating system, you can directly add it after importing os. For example: os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. After doing this, it should work fine.

AnduFalaH avatar Mar 23 '23 08:03 AnduFalaH

如果用的是conda管理环境:
首先用conda list | grep cuda确定该环境cuda运行时版本,如11.7
然后从nvidia源安装cudatoolkit

conda install cudatoolkit=11.7 -c nvidia

mjysci avatar Mar 24 '23 13:03 mjysci

如果用的是conda管理环境: 首先用conda list | grep cuda确定该环境cuda运行时版本,如11.7。 然后从nvidia源安装cudatoolkit

conda install cudatoolkit=11.7 -c nvidia

实测可以解决问题,环境

Windows 11 + WSL2 Debian
pytorch==2.0.0
transformers==4.26.1

LucienShui avatar Mar 24 '23 17:03 LucienShui

如果用的是conda管理环境: 首先用conda list | grep cuda确定该环境cuda运行时版本,如11.7。 然后从nvidia源安装cudatoolkit

conda install cudatoolkit=11.7 -c nvidia

it works, :)

RRRoger avatar Mar 28 '23 14:03 RRRoger

我在wsl2里面也遇到了相同的问题,按照微软的推荐未在wsl中设置任何cuda tookit,出现了上述错误“[RuntimeError: Library cudart is not initialized]"

judgementc avatar Apr 01 '23 12:04 judgementc

我也是一样的问题,上面讲我看都是扯淡, 压根就不是环境问题好么,怎么解决???????????: 好郁闷,写了几行代码这么多兼容问题~~ 图片 图片

gg22mm avatar Apr 03 '23 07:04 gg22mm

我也遇到这个问题,找不到解决思路。目前通过在train的时候去掉 --quantization_bit 4 这个选项,放弃4bit量化可以跑通。

flyingtimes avatar Apr 03 '23 22:04 flyingtimes

The same issue. How to fix it in ubuntu OS?

weiliswen avatar Apr 04 '23 11:04 weiliswen

目前通过在train的时候去掉 --quantization_bit 4 这个选项,放弃4bit量化可以跑通。

说得对 去掉--quantization_bit 4 确实是没这个报错了, 不知道官方有没有发现?

gg22mm avatar Apr 14 '23 08:04 gg22mm

还有就是预测也是一样的问题,预测还没没有这个参数

gg22mm avatar Apr 14 '23 08:04 gg22mm

很肯能是cuda版本和pytorch对应的cuda版本不同,我在windows安装的cuda版本是12,安装pytorch对应的cuda版本是11.8,然后就报了错,卸载cuda后安装11.8的cuda就可以了

yuquant avatar Apr 18 '23 00:04 yuquant

我也是一样的问题,上面讲我看都是扯淡, 压根就不是环境问题好么,怎么解决???????????: 好郁闷,写了几行代码这么多兼容问题~~ 图片 图片

me too

SeekPoint avatar Apr 18 '23 05:04 SeekPoint

我也是一样的问题,上面讲我看都是扯淡, 压根就不是环境问题好么,怎么解决???????????: 好郁闷,写了几行代码这么多兼容问题~~ 图片 图片

me too

把--quantization_bit 4去掉试试

529106896 avatar Apr 18 '23 08:04 529106896

还有就是预测也是一样的问题,预测还没没有这个参数

推理时确实出现这个问题,我装了cudatoolkit也不行

bingoohe avatar Apr 19 '23 02:04 bingoohe

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

Richard-Ni avatar Apr 19 '23 14:04 Richard-Ni

  1. 首先,在环境里找到torch库内nvrtc开头的一个链接库文件,比如我的是在windows平台、miniconda的环境里的C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll路径。不同平台应该都有所不同。
  2. 把这个文件所在目录加到PATH里。如果不希望污染操作系统的PATH,可以直接在开头import os之后直接加进去,例如: os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. 然后打开应该就可以了。

English version(Translated by ChatGPT):

  1. First, find a library file starting with "nvrtc" in the torch library in your environment. For example, mine is located at the path C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll in a Windows platform with miniconda installed. The path may differ for different platforms.
  2. Add the directory where the file is located to your PATH. If you don't want to modify the PATH of your operating system, you can directly add it after importing os. For example: os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. After doing this, it should work fine.

这个方法对我环境管用的,另外顺便提供一个通用代码:

import pkg_resources
import os
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + pkg_resources.resource_filename('torch', 'lib')

l3yx avatar Apr 26 '23 01:04 l3yx

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

这个管用

siyuan163 avatar May 15 '23 03:05 siyuan163

conda环境里安装你cuda 对应版本的 cuda-toolkit ,比如我是最新的cuda 12.1 conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit https://anaconda.org/nvidia/cuda-toolkit

linuxdevopscn avatar May 25 '23 06:05 linuxdevopscn

@weiliswen

I tried the same way on ubuntu conda install cudatoolkit=11.8 -c nvidia

working for me

GoldExperience avatar May 25 '23 10:05 GoldExperience

是这样的,直接搞定。 另外我的ubuntu 22.04还遇到了gcc编译时候问题 crti.o no such file or directory 用这样: sudo apt install libc6=2.35-0ubuntu3 sudo apt install libc6-dev

murainwood avatar Jun 01 '23 10:06 murainwood

Linux 下可能可以这样解决,参考: Support loading cuda libraries from nvidia package. https://github.com/OpenBMB/cpm_kernels/pull/8

codingfun2022 avatar Jun 14 '23 04:06 codingfun2022

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

有效,十分感谢

jushe avatar Jun 15 '23 16:06 jushe

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

The same env and encounter the same problem, and it works for me. Thanks.

ablozhou avatar Jun 26 '23 09:06 ablozhou

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

正解! 如果是 Ubuntu 20.04,执行: sudo apt install libcudart10.1 libcublaslt10

njutsiang avatar Jul 01 '23 05:07 njutsiang

  1. 首先,在环境里找到torch库内nvrtc开头的一个链接库文件,比如我的是在windows平台、miniconda的环境里的C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll路径。不同平台应该都有所不同。
  2. 把这个文件所在目录加到PATH里。如果不希望污染操作系统的PATH,可以直接在开头import os之后直接加进去,例如: os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. 然后打开应该就可以了。

English version(Translated by ChatGPT):

  1. First, find a library file starting with "nvrtc" in the torch library in your environment. For example, mine is located at the path C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll in a Windows platform with miniconda installed. The path may differ for different platforms.
  2. Add the directory where the file is located to your PATH. If you don't want to modify the PATH of your operating system, you can directly add it after importing os. For example: os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. After doing this, it should work fine.

这个方法对我环境管用的,另外顺便提供一个通用代码:

import pkg_resources
import os
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + pkg_resources.resource_filename('torch', 'lib')

这个解决了我的问题

KelvinJhu avatar Jul 07 '23 03:07 KelvinJhu

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

正解! 如果是 Ubuntu 20.04,执行: sudo apt install libcudart10.1 libcublaslt10

版本要匹配,否则nvidia-smi 会出现 Failed to initialize NVML: Driver/library version mismatch 的问题

yzbx avatar Jul 11 '23 02:07 yzbx