DeepSpeed
DeepSpeed copied to clipboard
[question] how many GPUs are used by 'single_node' in the default deepchat running script
i only have 2 A100(80GB), so I need to know about the count of GPUs used by single_node to adjust train_batch_size in my situation. thanks in advance.
i mean the default training script for deepchat.
i mean the default training script for deepchat.
我刚好也在跑这个,跑了一个完整的流程,也遇到同样的OOM问题。
系统环境如下 cat /etc/issue | Ubuntu 18.04.6 LTS \n \l nvcc --version | Build cuda_11.3.r11.3/compiler.29745058_0 nvidia-smi | 3090 (单卡24G)
pip list | grep deepspeed >> 0.9.1 pip list | grep torch >> 1.12.0+cu113 pip list | grep transformers >> 4.28.1
详细内容参见:deepspeed体验
@feiliya333 DeepSpeed will automatically detect the number of GPUs on the node (in your case 2) and use all of them for training in the single_node delpoyment type.