accelerate
accelerate copied to clipboard
All DeepSpeed Stag have the same memory footprint.
System Info
- `Accelerate` version: 0.27.2
- Platform: Linux-5.15.0-92-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- Numpy version: 1.26.3
- PyTorch version (GPU?): 2.2.0 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 1007.75 GB
- GPU type: NVIDIA A100-PCIE-40GB
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: DEEPSPEED
- mixed_precision: bf16
- use_cpu: False
- debug: False
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- deepspeed_config: {'gradient_accumulation_steps': 1, 'zero3_init_flag': False, 'zero_stage': 0}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
I am using a modified version of run_clm_no_trainer.py, I am using a TinyLlama model. I've try to use deepspeed stage 0,1,2,3 and all of them have the same memory footprint
This is the output for all stages and I've try to increase the batch size with different stages but also I couldn't use larger batch size even with stage 3, The model size is kind of small, and for all stages I couldn't reserve batch size more than 4 with context length 2048 tokens.
Expected behavior
I thought that different stages with deepspeed should reduce memory footprint and increase the time deepening on the stage itself. stage 0 should have different memory usage than stages 1,2,3 and stage 1 should have different memory usage than stages 0,2,3 the same for 2,3.
cc @pacman100
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.