zhilinwang1
Results
1
issues of
zhilinwang1
用的tensor parallel 8, offload optimizer,flash attention, vllm,在8*96G 的机器上OOM 下面是具体的配置和报错: nproc_per_node=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ NNODES=$nnodes \ NODE_RANK=$RANK \ MASTER_ADDR=$MASTER_ADDR \ MASTER_PORT=$MASTER_PORT \ NPROC_PER_NODE=$nproc_per_node \ swift rlhf \ --rlhf_type grpo \ --model xxxx/xxxxx...