zhilinwang1 issues

Repositories
Issues
Comments

Results 1 issues of


                                            zhilinwang1

grpo训练32b模型OOM

用的tensor parallel 8， offload optimizer，flash attention， vllm，在8*96G 的机器上OOM 下面是具体的配置和报错： nproc_per_node=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ NNODES=$nnodes \ NODE_RANK=$RANK \ MASTER_ADDR=$MASTER_ADDR \ MASTER_PORT=$MASTER_PORT \ NPROC_PER_NODE=$nproc_per_node \ swift rlhf \ --rlhf_type grpo \ --model xxxx/xxxxx...