ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[BUG/Help] <title>deepspeed加载模型,需要什么样的配置?4卡v100,v100是32g的,没有跑起来

Open expresschen opened this issue 1 year ago • 23 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

4卡v100,v100是32g的,没有跑起来

Expected Behavior

No response

Steps To Reproduce

Environment

- OS:centos 7
- Python:3.8
- Transformers:4.27.1
- PyTorch:1.12
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

expresschen avatar Apr 14 '23 04:04 expresschen

你说的没有跑起来是指运行没报错,但是没有回复吗?

ZXStudio avatar Apr 14 '23 04:04 ZXStudio

情况和这个相同么? https://github.com/THUDM/ChatGLM-6B/issues/592#event-9003274415

white-wolf-tech avatar Apr 14 '23 04:04 white-wolf-tech

I have the same GPU, and i can run only use one gpu.Can you show your error?

LixinLu42 avatar Apr 14 '23 04:04 LixinLu42

代码运行时报CUDA out of memory

expresschen avatar Apr 14 '23 06:04 expresschen

代码运行时报CUDA out of memory

deepspeed --num_gpus=4 改成这个

superbigsea avatar Apr 14 '23 06:04 superbigsea

改了

expresschen avatar Apr 14 '23 06:04 expresschen

改了

我这边440G跑不起来,740可以跑

superbigsea avatar Apr 14 '23 06:04 superbigsea

max_source_length 修改下参数呢

superbigsea avatar Apr 14 '23 06:04 superbigsea

改了

我这边4_40G跑不起来,7_40可以跑

好的,看官方给的代码使用4卡,就不是知道他们使用什么机器跑

expresschen avatar Apr 14 '23 06:04 expresschen

改了

我这边4_40G跑不起来,7_40可以跑

好的,看官方给的代码使用4卡,就不是知道他们使用什么机器跑 他们用的80G的A100可能

liuanping avatar Apr 14 '23 06:04 liuanping

我这边一设置8卡 或者非4卡就出一个“Bus error:nonexistent physical address”的问题

liuanping avatar Apr 14 '23 06:04 liuanping

代码运行时报CUDA out of memory 哥们跑起来了吗?我遇到和你一样的问题了

Dagoli avatar Apr 14 '23 07:04 Dagoli

7_40

请问max_source_length设置的是多少呢?和样例一样是64吗

lyx3911 avatar Apr 14 '23 08:04 lyx3911

使用gpu+cpu加载,可以跑起来

expresschen avatar Apr 14 '23 08:04 expresschen

使用gpu+cpu加载,可以跑起来

请教下 deepspeed训练之后。 的模型是增量模型。怎么叠加在原本的模型之上?

superbigsea avatar Apr 14 '23 09:04 superbigsea

使用gpu+cpu加载,可以跑起来

那这个deepspeed.json配置文件要怎么修改呢?大佬可以提供一下吗

lyx3911 avatar Apr 14 '23 09:04 lyx3911

我用8张V100可以跑起来,但是max_source_length长度只能设为64,如果要设置为256,是不是只能增加卡数了。 感觉要full model finetuning 还是得模型并行才行

lyx3911 avatar Apr 14 '23 15:04 lyx3911

{ "train_micro_batch_size_per_gpu": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } },

"zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients" : true } }

expresschen avatar Apr 15 '23 00:04 expresschen

相同的问题,8张40G V100跑不起来,CUDA out of memory. zero2 zero3都不行,empty_init 也试了不行,是模型底层没有支持并行?,每张卡都必须满参数?

danyang-rainbow avatar Apr 15 '23 18:04 danyang-rainbow

相同的问题,8张40G V100跑不起来,CUDA out of memory. zero2 zero3都不行,empty_init 也试了不行,是模型底层没有支持并行?,每张卡都必须满参数?

wo cao 这么牛逼~~

gg22mm avatar Apr 16 '23 01:04 gg22mm

亲测,使用 deepspeed 跑成功,配置如下: 单卡rtx4090,24Ggpu 需要在命令最后行,增加 --quantization_bit 4 量化4,--num_gpus=1

mark-libn avatar Apr 18 '23 09:04 mark-libn

{'train_runtime': 2021.6082, 'train_samples_per_second': 9.893, 'train_steps_per_second': 2.473, 'train_loss': 3.52549921875, 'epoch': 0.17}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [33:41<00:00, 2.47it/s] ***** train metrics ***** epoch = 0.17 train_loss = 3.5255 train_runtime = 0:33:41.60 train_samples = 114602 train_samples_per_second = 9.893 train_steps_per_second = 2.473 [2023-04-18 17:48:39,281] [INFO] [launch.py:460:main] Process 3684 exits successfully.

mark-libn avatar Apr 18 '23 09:04 mark-libn

亲测,使用 deepspeed 跑成功,配置如下: 单卡rtx4090,24Ggpu 需要在命令最后行,增加 --quantization_bit 4 量化4,--num_gpus=1

我下载了最新的Int4模型,这么配置参数,跑的时候直接报错RuntimeError: expected scalar type Float but found Half。请问你是怎么解决的?

feyxong avatar Apr 20 '23 12:04 feyxong

模型路径下的quantization.py文件:52与53行之间和62与64行之间分别加入weight = weight.to(torch.float)就可以了 @feyxong

dailong avatar May 11 '23 03:05 dailong

模型路径下的quantization.py文件:52与53行之间和62与64行之间分别加入weight = weight.to(torch.float)就可以了 @feyxong

ths

feyxong avatar May 11 '23 04:05 feyxong

No compiled kernel found. Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so Load kernel : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so Setting CPU quantization kernel threads to 36 input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3] inputs 类型#裤版型#宽松风格#性感图案#线条裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还 label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100] labels <image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100> 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还<image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100> /usr/local/lib/python3.8/dist-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( [2023-05-17 00:53:40,677] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.2, git-hash=unknown, git-branch=unknown [2023-05-17 00:53:45,424] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 44557 [2023-05-17 00:53:45,425] [ERROR] [launch.py:434:sigkill_handler] ['/usr/bin/python3', '-u', 'main.py', '--local_rank=0', '--deepspeed', 'deepspeed.json', '--do_train', '--train_file', 'AdvertiseGen/train.json', '--test_file', 'AdvertiseGen/dev.json', '--prompt_column', 'content', '--response_column', 'summary', '--overwrite_cache', '--model_name_or_path', '/root/ChatGLM-6B/chatglm-6b-int4', '--output_dir', './output/adgen-chatglm-6b-ft-1e-4', '--overwrite_output_dir', '--max_source_length', '64', '--max_target_length', '64', '--per_device_train_batch_size', '4', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--predict_with_generate', '--max_steps', '5000', '--logging_steps', '10', '--save_steps', '1000', '--learning_rate', '1e-4', '--fp16', '--quantization_bit', '4'] exits with return code = -7

int4直接退出没有报错,怎么解决

hexiaojin1314 avatar May 17 '23 00:05 hexiaojin1314

{ "train_micro_batch_size_per_gpu": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "betas": "auto", "eps": "auto", "weight_decay": "auto" } },

"zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients" : true } }

可以,4*A40用这个配置能跑起来

YerayL avatar Jun 15 '23 09:06 YerayL

Duplicate of #556

zhangch9 avatar Aug 16 '23 08:08 zhangch9