Open
ai499
opened this issue 1 year ago
•
18 comments
推理代码
from transformers import AutoTokenizer, AutoModel
model = AutoModel.from_pretrained("/root/visualglm_6B", trust_remote_code=True).float()
报错信息:
[2023-08-22 08:45:55,509] [INFO] DeepSpeed/CUDA is not installed, fallback to Pytorch checkpointing.
[2023-08-22 08:45:55,529] [WARNING] DeepSpeed Not Installed, you cannot import training_main from sat now.
Traceback (most recent call last):
File "", line 1, in
File "/root/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
return model_class.from_pretrained(
File "/root/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/visualglm_6B/modeling_chatglm.py", line 1345, in init
self.image_encoder = BLIP2(config.eva_config, config.qformer_config)
File "/root/.cache/huggingface/modules/transformers_modules/visualglm_6B/visual.py", line 59, in init
self.vit = EVAViT(EVAViT.get_args(**eva_args))
File "/root/.cache/huggingface/modules/transformers_modules/visualglm_6B/visual.py", line 20, in init
super().init(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
File "/root/.local/lib/python3.10/site-packages/sat/model/official/vit_model.py", line 110, in init
super().init(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
File "/root/.local/lib/python3.10/site-packages/sat/model/base_model.py", line 88, in init
success = _simple_init(model_parallel_size=args.model_parallel_size)
File "/root/.local/lib/python3.10/site-packages/sat/arguments.py", line 304, in _simple_init
if initialize_distributed(args): # first time init model parallel, print warning
File "/root/.local/lib/python3.10/site-packages/sat/arguments.py", line 507, in initialize_distributed
torch.distributed.init_process_group(
File "/root/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 907, in init_process_group
default_pg = _new_process_group_helper(
File "/root/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1024, in _new_process_group_helper
backend_class = ProcessGroupNCCL(backend_prefix_store, group_rank, group_size, pg_options)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
SwissArmyTransformer是0.4.8版本,已经是最新版本了,然后直接用cli_demo.py还是报了这个错,请问是怎么解决的呀? @ai499 @1049451037
[INFO] DeepSpeed/CUDA is not installed, fallback to Pytorch checkpointing.
[INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] Failed to load bitsandbytes:No module named 'bitsandbytes'
[INFO] building VisualGLMModel model ...
Traceback (most recent call last):
File "/work/home/VisualGLM-6B-main/cli_demo.py", line 103, in
main()
File "/work/home/VisualGLM-6B-main/cli_demo.py", line 30, in main
model, model_args = AutoModel.from_pretrained(
File "/work/.env/visualglm/lib/python3.10/site-packages/sat/model/base_model.py", line 337, in from_pretrained
return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
File "/work/.env/visualglm/lib/python3.10/site-packages/sat/model/base_model.py", line 329, in from_pretrained_base
model = get_model(args, model_cls, **kwargs)
File "/work/.env/visualglm/lib/python3.10/site-packages/sat/model/base_model.py", line 379, in get_model
model = model_cls(args, params_dtype=params_dtype, **kwargs)
File "/work/home/VisualGLM-6B-main/model/visualglm.py", line 32, in init
super().init(args, transformer=transformer, **kwargs)
File "/work/.env/visualglm/lib/python3.10/site-packages/sat/model/official/chatglm_model.py", line 167, in init
super(ChatGLMModel, self).init(args, transformer=transformer, activation_func=gelu, **kwargs)
File "/work/.env/visualglm/lib/python3.10/site-packages/sat/model/base_model.py", line 88, in init
success = _simple_init(model_parallel_size=args.model_parallel_size)
File "/work/.env/visualglm/lib/python3.10/site-packages/sat/arguments.py", line 308, in _simple_init
if initialize_distributed(args): # first time init model parallel, print warning
File "/work/.env/visualglm/lib/python3.10/site-packages/sat/arguments.py", line 511, in initialize_distributed
torch.distributed.init_process_group(
File "/work/.env/visualglm/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
File "/work/.env/visualglm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1148, in init_process_group
default_pg, _ = _new_process_group_helper(
File "/work/.env/visualglm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1279, in _new_process_group_helper
backend_class = ProcessGroupNCCL(backend_prefix_store, group_rank, group_size, pg_options)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
git clone https://github.com/THUDM/SwissArmyTransformer
cd SwissArmyTransformer
pip install .
谢谢解决了这个问题!但又出现了新问题,推理时要是使用gpu,就会报以下错误:
RuntimeError: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver.
使用cpu可以跑,但是速度太慢了,我的cuda版本是11.7不低吧,所以这个项目对cuda版本有什么要求吗?