Does DeepSpeed + Accelerate Support Pipeline Parallelism
I have been trying a number of pipeline configs in deepspeed like the following
{
"fp16": {
"enabled": true
},
"bf16": {
"enabled": false
},
"zero_optimization": {
"stage": 3,
"overlap_comm": true,
"contiguous_gradients": true,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"gather_16bit_weights_on_model_save": true,
"round_robin_gradients": true,
"reduce_scatter": true,
"zero_quantized_weights": true,
"zero_hpz_partition_size": 8,
"zero_quantized_gradients": true
},
"gradient_accumulation_steps": 1,
"gradient_clipping": "auto",
"steps_per_print": 1,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false,
"flops_profiler": {
"enabled": true,
"profile_step": 10,
"module_depth": -1,
"top_modules": 1,
"detailed": true
},
"pipeline": {
"stages": 8,
"partition_method": "uniform"
}
}
And can see the pipeline configs being displayed in my training logs when DeepSpeed outputs the full configuration. However, it seems like the changes I make the pipeline have no effect on training. I am wondering if these config options are somehow being thrown away by Accelerate. Curious if others have found ways to get some introspection on how PP is working in deepspeed + accelerate.
Seems from the docs this is called out in a caveat. It might make sense to loudly crash when someone tries to directly configure PP? Also what is the plan to integrate PP in accelerate?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Same question: does Accelerate support pipeline parallelism? Hoping this reopens the issue.
Same question: does Accelerate support pipeline parallelism? Hoping this reopens the issue.