VisualGLM-6B
VisualGLM-6B copied to clipboard
使用cli_demo.py 跑自己finetune的权重,报错 The size of tensor a (12288) must match the size of tensor b (25165824) at non-singleton dimension 0
qlora权重只能在cuda上跑,不能在cpu上跑。这是bitsandbytes的实现,我也控制不了。
qlora权重只能在cuda上跑,不能在cpu上跑。这是bitsandbytes的实现,我也控制不了。
那就是如果使用cli_demo.py 执行微调的权重,就不能加--quant 这个参数,加载了就只能用cpu,是这样吗
qlora不能加--quant,这是两个独立的功能。--quant是用来给本来没有qlora的模型用的。
那如何执行自己finetune后的权重呢
直接执行,按照readme的命令
https://github.com/THUDM/VisualGLM-6B#模型微调
我是2080Ti显卡,11g显存,单卡跑不动模型,直接就oom了
那你是怎么微调的。。。
微调就用的qlora。 保存的权重就是qlora类型是吗?但是我执行cli_demo.py,不加quant直接就OOM了
是的,保存的权重就是qlora。
能知道在哪里报错的OOM吗?也许是生成的序列太长,就爆了。
刚执行 python cli_demo.py --from_pretrained xxx 就报错了
[2023-08-03 14:26:06,452] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /root/anaconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda115.so
/root/anaconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /root/anaconda3/envs/llm did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /root/.kiwi/lib/cuda11.5/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 115
CUDA SETUP: Loading binary /root/anaconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda115.so...
[2023-08-03 14:26:08,291] [INFO] building FineTuneVisualGLMModel model ...
[2023-08-03 14:26:08,293] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-08-03 14:26:08,294] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
Traceback (most recent call last):
File "/data/data_01/llm/VisualGLM-6B/cli_demo.py", line 103, in
需要merge 下qlora权重吗
哦,我知道了,因为构建模型的时候就放在cuda上了……暂时只能这样改:
把cli_demo.py这段代码:
# load model
model, model_args = AutoModel.from_pretrained(
args.from_pretrained,
args=argparse.Namespace(
fp16=True,
skip_init=True,
use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False,
device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu',
))
换成这样:
# load model
model, model_args = AutoModel.from_pretrained(
args.from_pretrained,
args=argparse.Namespace(
fp16=True,
skip_init=True,
use_gpu_initialization=False,
device='cpu',
), build_only=True)
model = model.to('cuda')
from sat.training.model_io import load_checkpoint
load_checkpoint(model, model_args, load_path=args.from_pretrained)
佬,还是直接oom了
不应该呀,你确定你就是在这张卡上微调的吗……如果是的话显然是可以加载的。
不应该呀,你确定你就是在这张卡上微调的吗……如果是的话显然是可以加载的。 是同一张卡。。 是在这块就报oom了
这块如果你换成了我的代码,就不会占用任何gpu。因为device='cpu'。说明你没改代码。
哦,我知道了,因为构建模型的时候就放在cuda上了……暂时只能这样改:
把cli_demo.py这段代码:
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( fp16=True, skip_init=True, use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False, device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu', ))
换成这样:
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( fp16=True, skip_init=True, use_gpu_initialization=False, device='cpu', ), build_only=True) model = model.to('cuda') from sat.training.model_io import load_checkpoint load_checkpoint(model, model_args, load_path=args.from_pretrained)
请按照我说的做。
之前确实改了,这次好了。 感谢感谢
我刚才测试了下发现这种微调效果很差
模型训练相关的就需要您自己探索了。
好的 我再试试
parser.add_argument("--prompt_zh", type=str, default="描述这张图片", help='Chinese prompt for the first round') 请问您在执行cli_demo.py时,这个prompt是什么,我训练给的prompt有好几种,这块为什么需要提前输入一种prompt呢
您可以根据需要修改cli_demo.py。
感谢耐心答复
您可以根据需要修改cli_demo.py。
哦,我知道了,因为构建模型的时候就放在cuda上了……暂时只能这样改:
把cli_demo.py这段代码:
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( fp16=True, skip_init=True, use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False, device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu', ))
换成这样:
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( fp16=True, skip_init=True, use_gpu_initialization=False, device='cpu', ), build_only=True) model = model.to('cuda') from sat.training.model_io import load_checkpoint load_checkpoint(model, model_args, load_path=args.from_pretrained)
佬,我这样改了以后报了新的错误,这又是因为什么原因?
[2023-11-12 19:24:25,752] [INFO] [RANK 0] global rank 0 is loading checkpoint /home/llf/VisualGLM-6B/checkpoints/finetune-visualglm-6b-11-12-19-08/100/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "/home/llf/VisualGLM-6B/cli_demo.py", line 115, in
您可以根据需要修改cli_demo.py。
哦,我知道了,因为构建模型的时候就放在cuda上了……暂时只能这样改: 把cli_demo.py这段代码:
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( fp16=True, skip_init=True, use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False, device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu', ))
换成这样:
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( fp16=True, skip_init=True, use_gpu_initialization=False, device='cpu', ), build_only=True) model = model.to('cuda') from sat.training.model_io import load_checkpoint load_checkpoint(model, model_args, load_path=args.from_pretrained)
佬,我这样改了以后报了新的错误,这又是因为什么原因? [2023-11-12 19:24:25,752] [INFO] [RANK 0] global rank 0 is loading checkpoint /home/llf/VisualGLM-6B/checkpoints/finetune-visualglm-6b-11-12-19-08/100/mp_rank_00_model_states.pt Traceback (most recent call last): File "/home/llf/VisualGLM-6B/cli_demo.py", line 115, in main() File "/home/llf/VisualGLM-6B/cli_demo.py", line 48, in main load_checkpoint(model, model_args, load_path=args.from_pretrained) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/sat/training/model_io.py", line 238, in load_checkpoint missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2027, in load_state_dict load(self, state_dict) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load load(child, child_state_dict, child_prefix) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load load(child, child_state_dict, child_prefix) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load load(child, child_state_dict, child_prefix) [Previous line repeated 3 more times] File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2009, in load module._load_from_state_dict( File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/sat/model/finetune/lora2.py", line 49, in _load_from_state_dict copy_nested_list(state_dict[prefix+'quant_state'], self.weight.quant_state) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/sat/model/finetune/lora2.py", line 37, in copy_nested_list for i in range(len(dst)): TypeError: object of type 'QuantState' has no len()
请问这个问题解决了吗?我用QLora方法跑完官方微调项目,加载模型的时候报了一样的错
您可以根据需要修改cli_demo.py。
哦,我知道了,因为构建模型的时候就放在cuda上了……暂时只能这样改: 把cli_demo.py这段代码:
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( fp16=True, skip_init=True, use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False, device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu', ))
换成这样:
# load model model, model_args = AutoModel.from_pretrained( args.from_pretrained, args=argparse.Namespace( fp16=True, skip_init=True, use_gpu_initialization=False, device='cpu', ), build_only=True) model = model.to('cuda') from sat.training.model_io import load_checkpoint load_checkpoint(model, model_args, load_path=args.from_pretrained)
佬,我这样改了以后报了新的错误,这又是因为什么原因? [2023-11-12 19:24:25,752] [INFO] [RANK 0] global rank 0 is loading checkpoint /home/llf/VisualGLM-6B/checkpoints/finetune-visualglm-6b-11-12-19-08/100/mp_rank_00_model_states.pt Traceback (most recent call last): File "/home/llf/VisualGLM-6B/cli_demo.py", line 115, in main() File "/home/llf/VisualGLM-6B/cli_demo.py", line 48, in main load_checkpoint(model, model_args, load_path=args.from_pretrained) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/sat/training/model_io.py", line 238, in load_checkpoint missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2027, in load_state_dict load(self, state_dict) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load load(child, child_state_dict, child_prefix) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load load(child, child_state_dict, child_prefix) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2015, in load load(child, child_state_dict, child_prefix) [Previous line repeated 3 more times] File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2009, in load module._load_from_state_dict( File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/sat/model/finetune/lora2.py", line 49, in _load_from_state_dict copy_nested_list(state_dict[prefix+'quant_state'], self.weight.quant_state) File "/home/llf/anaconda3/envs/visualglm/lib/python3.10/site-packages/sat/model/finetune/lora2.py", line 37, in copy_nested_list for i in range(len(dst)): TypeError: object of type 'QuantState' has no len()
请问这个问题解决了吗?我用QLora方法跑完官方微调项目,加载模型的时候报了一样的错
我也遇到相同的问题,我用QLora方法跑完官方微调项目。在4090上运行报错,(py310_chat) yl@4-gpu:~/llm_ll/VisualGLM-6B$ CUDA_VISIBLE_DEVICES=1 python cli_demo.py --from_pretrained checkpoints/finetune-visualglm-6b-12-11-16-53/ --prompt_zh 这张图片的背景里有什么内容?
[2023-12-12 12:34:49,199] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-12 12:34:52,703] [INFO] building FineTuneVisualGLMModel model ...
[2023-12-12 12:34:52,705] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-12-12 12:34:52,706] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2023-12-12 12:34:52,707] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/data1/yl/anaconda3/envs/py310_chat/lib/python3.10/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2023-12-12 12:35:30,798] [INFO] [RANK 0] replacing layer 0 attention with lora
[2023-12-12 12:35:32,391] [INFO] [RANK 0] replacing layer 14 attention with lora
[2023-12-12 12:35:34,383] [INFO] [RANK 0] replacing chatglm linear layer with 4bit
[2023-12-12 12:38:06,999] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7802848768
[2023-12-12 12:38:20,010] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/finetune-visualglm-6b-12-11-16-53/300/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "/data1/yl/llm_ll/VisualGLM-6B/cli_demo.py", line 116, in