accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

optimizer.step_was_skipped not correct in accelerator.accumulate

Open Fadelis98 opened this issue 8 months ago • 1 comments

System Info

- `Accelerate` version: 0.30.1
- Platform: Linux-5.4.0-144-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /root/miniconda3/envs/qec/bin/accelerate
- Python version: 3.11.9
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.0 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- System RAM: 1007.53 GB
- GPU type: NVIDIA GeForce RTX 4090
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: no
        - use_cpu: False
        - debug: True
        - num_processes: 8
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: all
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: True
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [X] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • [X] My own task or dataset (give details below)

Reproduction

The value of optimizer.step_was_skipped, as it is named, should be True whenever optimizer.step() is called but the step is not actually applied to the parameters. The logic is implemented at here, which is inside the if self.gradient_state.sync_gradients condition. The standard implemention of gradient accumulation controls whather to actually step the optimizer by this condition, so optimizer.step_was_skipped would be always False

Expected behavior

If this is an expected behaviour, rename optimizer.step_was_skipped or note this behaviour in the doc string. Otherwise, fix its logic to return True when the gradient is accumulated.

Fadelis98 avatar Jun 05 '24 09:06 Fadelis98