accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

All DeepSpeed Stag have the same memory footprint.

Open khalil-Hennara opened this issue 1 year ago • 1 comments

System Info

- `Accelerate` version: 0.27.2
- Platform: Linux-5.15.0-92-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- Numpy version: 1.26.3
- PyTorch version (GPU?): 2.2.0 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 1007.75 GB
- GPU type: NVIDIA A100-PCIE-40GB
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: DEEPSPEED
        - mixed_precision: bf16
        - use_cpu: False
        - debug: False
        - num_processes: 2
        - machine_rank: 0
        - num_machines: 1
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - deepspeed_config: {'gradient_accumulation_steps': 1, 'zero3_init_flag': False, 'zero_stage': 0}
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • [X] My own task or dataset (give details below)

Reproduction

I am using a modified version of run_clm_no_trainer.py, I am using a TinyLlama model. I've try to use deepspeed stage 0,1,2,3 and all of them have the same memory footprint image This is the output for all stages and I've try to increase the batch size with different stages but also I couldn't use larger batch size even with stage 3, The model size is kind of small, and for all stages I couldn't reserve batch size more than 4 with context length 2048 tokens.

Expected behavior

I thought that different stages with deepspeed should reduce memory footprint and increase the time deepening on the stage itself. stage 0 should have different memory usage than stages 1,2,3 and stage 1 should have different memory usage than stages 0,2,3 the same for 2,3.

khalil-Hennara avatar Mar 02 '24 09:03 khalil-Hennara

cc @pacman100

muellerzr avatar Mar 04 '24 14:03 muellerzr

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 01 '24 15:04 github-actions[bot]