accelerate
accelerate copied to clipboard
optimizer.step_was_skipped not correct in accelerator.accumulate
System Info
- `Accelerate` version: 0.30.1
- Platform: Linux-5.4.0-144-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /root/miniconda3/envs/qec/bin/accelerate
- Python version: 3.11.9
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.0 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- System RAM: 1007.53 GB
- GPU type: NVIDIA GeForce RTX 4090
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- debug: True
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: True
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [X] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - [X] My own task or dataset (give details below)
Reproduction
The value of optimizer.step_was_skipped
, as it is named, should be True
whenever optimizer.step()
is called but the step is not actually applied to the parameters. The logic is implemented at here, which is inside the if self.gradient_state.sync_gradients
condition. The standard implemention of gradient accumulation controls whather to actually step the optimizer by this condition, so optimizer.step_was_skipped
would be always False
Expected behavior
If this is an expected behaviour, rename optimizer.step_was_skipped
or note this behaviour in the doc string. Otherwise, fix its logic to return True
when the gradient is accumulated.