accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

Does DeepSpeed + Accelerate Support Pipeline Parallelism

Open sam-h-bean opened this issue 1 year ago • 1 comments

I have been trying a number of pipeline configs in deepspeed like the following

{
    "fp16": {
        "enabled": true
    },
    "bf16": {
        "enabled": false
    },
    "zero_optimization": {
        "stage": 3,
        "overlap_comm": true,
        "contiguous_gradients": true,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "gather_16bit_weights_on_model_save": true,
        "round_robin_gradients": true,
        "reduce_scatter": true,
        "zero_quantized_weights": true,
        "zero_hpz_partition_size": 8,
        "zero_quantized_gradients": true
    },
    "gradient_accumulation_steps": 1,
    "gradient_clipping": "auto",
    "steps_per_print": 1,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false,
    "flops_profiler": {
        "enabled": true,
        "profile_step": 10,
        "module_depth": -1,
        "top_modules": 1,
        "detailed": true
   },
    "pipeline": {
        "stages": 8,
        "partition_method": "uniform"
    }
}

And can see the pipeline configs being displayed in my training logs when DeepSpeed outputs the full configuration. However, it seems like the changes I make the pipeline have no effect on training. I am wondering if these config options are somehow being thrown away by Accelerate. Curious if others have found ways to get some introspection on how PP is working in deepspeed + accelerate.

sam-h-bean avatar Jun 07 '24 18:06 sam-h-bean

Seems from the docs this is called out in a caveat. It might make sense to loudly crash when someone tries to directly configure PP? Also what is the plan to integrate PP in accelerate?

sam-h-bean avatar Jun 07 '24 19:06 sam-h-bean

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jul 08 '24 15:07 github-actions[bot]

Same question: does Accelerate support pipeline parallelism? Hoping this reopens the issue.

debnil-cws avatar Nov 20 '24 00:11 debnil-cws

Same question: does Accelerate support pipeline parallelism? Hoping this reopens the issue.

hashiting avatar Mar 06 '25 06:03 hashiting