accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

DeepSpeedEngineWrapper.backward() does a bit too much

Open zhc7 opened this issue 1 year ago • 18 comments

The source code of DeepSpeedEngineWrapper:

class DeepSpeedEngineWrapper:
    """
    Internal wrapper for deepspeed.runtime.engine.DeepSpeedEngine. This is used to follow conventional training loop.

    Args:
        engine (deepspeed.runtime.engine.DeepSpeedEngine): deepspeed engine to wrap
    """

    def __init__(self, engine):
        self.engine = engine

    def backward(self, loss, **kwargs):
        # runs backpropagation and handles mixed precision
        self.engine.backward(loss, **kwargs)

        # Deepspeed's `engine.step` performs the following operations:
        # - gradient accumulation check
        # - gradient clipping
        # - optimizer step
        # - zero grad
        # - checking overflow
        # - lr_scheduler step (only if engine.lr_scheduler is not None)
        self.engine.step()
        # and this plugin overrides the above calls with no-ops when Accelerate runs under
        # Deepspeed, but allows normal functionality for non-Deepspeed cases thus enabling a simple
        # training loop that works transparently under many training regimes.

My question is: Why do we need to do self.engine.step() here immediately? This behavior zeros grad and change the parameter without noticing the user. It might be out of expectation. Since backward step is internally binded with zeroing grad and changing parameter, this blocks users from checking the gradient or parameter manually before stepping.

I know deepspeed-wrapped models can't be seen as normal models, but this behavior still elimiates a lot of flexibility.

zhc7 avatar Jul 22 '24 16:07 zhc7

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Aug 22 '24 15:08 github-actions[bot]

Hitting this snag right now actually, will see what we decide to do

muellerzr avatar Sep 06 '24 17:09 muellerzr

We're working with the DS team to try and remove the engine entirely, however as a user you can always call model.engine.backward() etc manually without harm in accelerate

muellerzr avatar Sep 10 '24 15:09 muellerzr

Somehow

loss = loss / accelerator.gradient_accumulation_steps
accelerator.deepspeed_engine_wrapped.engine.backward(loss)

does not give equivalent results to accelerator.backward(loss)

with deepspeed. What gives?

nom avatar Sep 23 '24 22:09 nom

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 18 '24 15:10 github-actions[bot]

same question here.

LinB203 avatar Oct 27 '24 10:10 LinB203

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Nov 20 '24 15:11 github-actions[bot]

same question here. It's indeed very hard to understand why it was designed this way. I believe accelerator.backward(loss) should only perform the backward operation, and other steps should be written outside this function in a more standard and understandable manner.

GreenWindow1997 avatar Nov 22 '24 03:11 GreenWindow1997

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Dec 16 '24 15:12 github-actions[bot]

Not stale.

LinB203 avatar Dec 16 '24 15:12 LinB203

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 10 '25 15:01 github-actions[bot]

self.engine.module.named_parameters() might have the gradients you are looking for. I am not sure about this as I am not the person who knows enough to answer these esoteric questions but I also went through a similar problem of wanting to observe the gradients of the model after the backward pass and found out this solution which worked for me.

I hope it helps and if it did glad to be of any help to you.

Happy Coding!

Nirmal-Adhikari-hub avatar Feb 07 '25 08:02 Nirmal-Adhikari-hub

So for a bit more context, we've been waiting on this PR to happen, so hopefully we can give more flexibility soon: https://github.com/deepspeedai/DeepSpeed/pull/7018

muellerzr avatar Feb 11 '25 17:02 muellerzr

Hello, any update regarding this issue?

MaxHeuillet avatar Mar 17 '25 16:03 MaxHeuillet

Any update?

hzphzp avatar Apr 10 '25 08:04 hzphzp

Does that mean if I use deepspeed integration with huggingface's Trainer, the parameters will be updated twice in each step?

Boltzmachine avatar Jun 20 '25 00:06 Boltzmachine

Does that mean if I use deepspeed integration with huggingface's Trainer, the parameters will be updated twice in each step?

no, it won't.

naomili0924 avatar Oct 23 '25 04:10 naomili0924

https://github.com/huggingface/accelerate/pull/3819 I created this PR to debug parameters for a short time, feel free to use it. But even with it still heavily depends on the deepspeed engine.

naomili0924 avatar Oct 24 '25 04:10 naomili0924