DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[question] how many GPUs are used by 'single_node' in the default deepchat running script

Open feiliya333 opened this issue 2 years ago • 3 comments

i only have 2 A100(80GB), so I need to know about the count of GPUs used by single_node to adjust train_batch_size in my situation. thanks in advance.

feiliya333 avatar Apr 22 '23 18:04 feiliya333

i mean the default training script for deepchat.

feiliya333 avatar Apr 22 '23 18:04 feiliya333

i mean the default training script for deepchat.

我刚好也在跑这个,跑了一个完整的流程,也遇到同样的OOM问题。

系统环境如下 cat /etc/issue | Ubuntu 18.04.6 LTS \n \l nvcc --version | Build cuda_11.3.r11.3/compiler.29745058_0 nvidia-smi | 3090 (单卡24G)

pip list | grep deepspeed >> 0.9.1 pip list | grep torch >> 1.12.0+cu113 pip list | grep transformers >> 4.28.1

详细内容参见:deepspeed体验

chenyangMl avatar Apr 24 '23 13:04 chenyangMl

@feiliya333 DeepSpeed will automatically detect the number of GPUs on the node (in your case 2) and use all of them for training in the single_node delpoyment type.

jomayeri avatar May 01 '23 20:05 jomayeri