ChatGLM-6B
ChatGLM-6B copied to clipboard
[BUG/Help] <title>deepspeed加载模型,需要什么样的配置?4卡v100,v100是32g的,没有跑起来
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
4卡v100,v100是32g的,没有跑起来
Expected Behavior
No response
Steps To Reproduce
Environment
- OS:centos 7
- Python:3.8
- Transformers:4.27.1
- PyTorch:1.12
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
你说的没有跑起来是指运行没报错,但是没有回复吗?
情况和这个相同么? https://github.com/THUDM/ChatGLM-6B/issues/592#event-9003274415
I have the same GPU, and i can run only use one gpu.Can you show your error?
代码运行时报CUDA out of memory
代码运行时报CUDA out of memory
deepspeed --num_gpus=4 改成这个
改了
改了
我这边440G跑不起来,740可以跑
max_source_length 修改下参数呢
改了
我这边4_40G跑不起来,7_40可以跑
好的,看官方给的代码使用4卡,就不是知道他们使用什么机器跑
改了
我这边4_40G跑不起来,7_40可以跑
好的,看官方给的代码使用4卡,就不是知道他们使用什么机器跑 他们用的80G的A100可能
我这边一设置8卡 或者非4卡就出一个“Bus error:nonexistent physical address”的问题
代码运行时报CUDA out of memory 哥们跑起来了吗?我遇到和你一样的问题了
7_40
请问max_source_length设置的是多少呢?和样例一样是64吗
使用gpu+cpu加载,可以跑起来
使用gpu+cpu加载,可以跑起来
请教下 deepspeed训练之后。 的模型是增量模型。怎么叠加在原本的模型之上?
使用gpu+cpu加载,可以跑起来
那这个deepspeed.json配置文件要怎么修改呢?大佬可以提供一下吗
我用8张V100可以跑起来,但是max_source_length长度只能设为64,如果要设置为256,是不是只能增加卡数了。 感觉要full model finetuning 还是得模型并行才行
{ "train_micro_batch_size_per_gpu": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } },
"zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients" : true } }
相同的问题,8张40G V100跑不起来,CUDA out of memory. zero2 zero3都不行,empty_init 也试了不行,是模型底层没有支持并行?,每张卡都必须满参数?
相同的问题,8张40G V100跑不起来,CUDA out of memory. zero2 zero3都不行,empty_init 也试了不行,是模型底层没有支持并行?,每张卡都必须满参数?
wo cao 这么牛逼~~
亲测,使用 deepspeed 跑成功,配置如下: 单卡rtx4090,24Ggpu 需要在命令最后行,增加 --quantization_bit 4 量化4,--num_gpus=1
{'train_runtime': 2021.6082, 'train_samples_per_second': 9.893, 'train_steps_per_second': 2.473, 'train_loss': 3.52549921875, 'epoch': 0.17}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [33:41<00:00, 2.47it/s]
***** train metrics *****
epoch = 0.17
train_loss = 3.5255
train_runtime = 0:33:41.60
train_samples = 114602
train_samples_per_second = 9.893
train_steps_per_second = 2.473
[2023-04-18 17:48:39,281] [INFO] [launch.py:460:main] Process 3684 exits successfully.
亲测,使用 deepspeed 跑成功,配置如下: 单卡rtx4090,24Ggpu 需要在命令最后行,增加 --quantization_bit 4 量化4,--num_gpus=1
我下载了最新的Int4模型,这么配置参数,跑的时候直接报错RuntimeError: expected scalar type Float but found Half。请问你是怎么解决的?
模型路径下的quantization.py文件:52与53行之间和62与64行之间分别加入weight = weight.to(torch.float)就可以了 @feyxong
模型路径下的quantization.py文件:52与53行之间和62与64行之间分别加入weight = weight.to(torch.float)就可以了 @feyxong
ths
No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Load kernel : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 36
input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
inputs 类型#裤版型#宽松风格#性感图案#线条裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
labels <image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100> 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还<image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100>
/usr/local/lib/python3.8/dist-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True
to disable this warning
warnings.warn(
[2023-05-17 00:53:40,677] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.2, git-hash=unknown, git-branch=unknown
[2023-05-17 00:53:45,424] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 44557
[2023-05-17 00:53:45,425] [ERROR] [launch.py:434:sigkill_handler] ['/usr/bin/python3', '-u', 'main.py', '--local_rank=0', '--deepspeed', 'deepspeed.json', '--do_train', '--train_file', 'AdvertiseGen/train.json', '--test_file', 'AdvertiseGen/dev.json', '--prompt_column', 'content', '--response_column', 'summary', '--overwrite_cache', '--model_name_or_path', '/root/ChatGLM-6B/chatglm-6b-int4', '--output_dir', './output/adgen-chatglm-6b-ft-1e-4', '--overwrite_output_dir', '--max_source_length', '64', '--max_target_length', '64', '--per_device_train_batch_size', '4', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--predict_with_generate', '--max_steps', '5000', '--logging_steps', '10', '--save_steps', '1000', '--learning_rate', '1e-4', '--fp16', '--quantization_bit', '4'] exits with return code = -7
int4直接退出没有报错,怎么解决
{ "train_micro_batch_size_per_gpu": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } },
"zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients" : true } }
可以,4*A40用这个配置能跑起来
Duplicate of #556