accelerate
accelerate copied to clipboard
Different accelerate lauch behavior via ssh command
System Info
- `Accelerate` version: 0.15.0
- Platform: Linux-5.11.0-1021-gcp-x86_64-with-glibc2.29
- Python version: 3.8.10
- Numpy version: 1.21.3
- PyTorch version (GPU?): 1.10.0+cu102 (False)
- `Accelerate` config passed:
- compute_environment: LOCAL_MACHINE
- distributed_type: TPU
- mixed_precision: no
- use_cpu: False
- dynamo_backend: NO
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- gpu_ids: None
- main_process_ip: None
- main_process_port: None
- rdzv_backend: static
- same_network: True
- main_training_function: main
- deepspeed_config: {}
- fsdp_config: {}
- megatron_lm_config: {}
- downcast_bf16: no
- tpu_name: None
- tpu_zone: None
- command_file: None
- commands: None
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
My test code(main.py):
from accelerate import Accelerator
def main():
accl = Accelerator()
if accl.is_main_process:
print("Master")
accl.wait_for_everyone()
print("ALL")
if __name__=="__main__":
main()
I executed below experiments on tpu v2-8, tpu-vm-pt1.10.
(on tpu vm) $ accelerate launch main.py -> works well (same with expected output)
(on remote) $ ssh <ip> "bash -l -c 'accelerate launch main.py'" -> unexpected output
Master
ALL
Master
ALL
Master
ALL
Master
ALL
Master
ALL
Master
ALL
Master
ALL
Master
ALL
There is no problem when I execute other python/bash scripts via ssh.
Expected behavior
Master
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL
cc @muellerzr