Yidhar comments

Results 3 comments of


                                            Yidhar

When strategy deepspeed, the key erro of zeRO will be error, and it will just crash

> Could you provide more information about your configuration, hardware and environment? As a common solution, reducing the batch size or accumulate step might help. We use H100*8 and use...

When strategy deepspeed, the key erro of zeRO will be error, and it will just crash

> > Could you provide more information about your configuration, hardware and environment? As a common solution, reducing the batch size or accumulate step might help.您能否提供有关您的配置、硬件和环境的更多信息？作为一种常见的解决方案，减小 batch size 或 accumulation...

When strategy deepspeed, the key erro of zeRO will be error, and it will just crash

deepspeed config: ` { "train_batch_size": 8 "zero_optimization": { "stage": 1, "allgather_partitions": true, "allgather_bucket_size": 1e9, "reduce_scatter": true, "reduce_bucket_size": 1e9, "overlap_comm": true, "contiguous_gradients": true }, "bf16": { "enabled": true } } `...