Accelerator.process_index only shows 0 in a 4 GPU env
System Info
- `Accelerate` version: 0.34.2
- Platform: Linux-5.15.0-1035-aws-x86_64-with-glibc2.31
- `accelerate` bash location: /home/ubuntu/abpani/FundName/myenv/bin/accelerate
- Python version: 3.10.14
- Numpy version: 2.1.1
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 186.72 GB
- GPU type: NVIDIA A10G
- `Accelerate` default config:
Not found
Information
- [ ] The official example scripts
- [x] My own modified scripts
Tasks
- [x] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [ ] My own task or dataset (give details below)
Reproduction
DEEPSPEED_CONFIG = { "fp16": { "enabled": True }, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu", "pin_memory": False }, "overlap_comm": True, "contiguous_gradients": True, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "gather_16bit_weights_on_model_save": True, "round_robin_gradients": True }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 10, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": False } deepspeed_plugin = DeepSpeedPlugin( hf_ds_config=DEEPSPEED_CONFIG )
accelerator = Accelerator(deepspeed_plugin=deepspeed_plugin, mixed_precision="fp16")
each GPU creates a string
message=[ f"Hello this is GPU {accelerator.process_index}" ]
collect the messages from all GPUs
messages=gather_object(message)
output the messages only on the main process with accelerator.print()
accelerator.print(messages)
['Hello this is GPU 0']
Expected behavior
It should show all 4 gpus in the output. and with accelerate I am not able to fine tune Mistral nemo model with batch size of more than 1
Hey @abpani ! This is normal as with accelerator.print, we only print one process. Just use print in your case. For information: https://huggingface.co/docs/accelerate/en/basic_tutorials/execution#execute-on-one-process LMK if this solves your issue !
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.