旅行家位移
旅行家位移
试了下可以增加deepspeed的optimizer配置至少可以跑起来,a10 8卡
> "optimizer": { "type": "AdamW", "params": { "lr": 1e-5, "betas": "auto", "eps": "auto", "weight_decay": "auto" } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true,...
试试这个,我是才开始跑了,不知道后面有坑没
transformer自带的optimizer性能不好
--gradient_accumulation_steps 16 \