Yidhar
Yidhar
> Could you provide more information about your configuration, hardware and environment? As a common solution, reducing the batch size or accumulate step might help. We use H100*8 and use...
> > Could you provide more information about your configuration, hardware and environment? As a common solution, reducing the batch size or accumulate step might help.您能否提供有关您的配置、硬件和环境的更多信息?作为一种常见的解决方案,减小 batch size 或 accumulation...
deepspeed config: ` { "train_batch_size": 8 "zero_optimization": { "stage": 1, "allgather_partitions": true, "allgather_bucket_size": 1e9, "reduce_scatter": true, "reduce_bucket_size": 1e9, "overlap_comm": true, "contiguous_gradients": true }, "bf16": { "enabled": true } } `...