accelerate Different accelerate lauch behavior via ssh command

Different accelerate lauch behavior via ssh command

Open starmpcc opened this issue 2 years ago • 1 comments

System Info

- `Accelerate` version: 0.15.0
- Platform: Linux-5.11.0-1021-gcp-x86_64-with-glibc2.29
- Python version: 3.8.10
- Numpy version: 1.21.3
- PyTorch version (GPU?): 1.10.0+cu102 (False)
- `Accelerate` config passed:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: TPU
        - mixed_precision: no
        - use_cpu: False
        - dynamo_backend: NO
        - num_processes: 8
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: None
        - main_process_ip: None
        - main_process_port: None
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - deepspeed_config: {}
        - fsdp_config: {}
        - megatron_lm_config: {}
        - downcast_bf16: no
        - tpu_name: None
        - tpu_zone: None
        - command_file: None
        - commands: None

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

My test code(main.py):

from accelerate import Accelerator

def main():
    accl = Accelerator()

    if accl.is_main_process:
        print("Master")
    accl.wait_for_everyone()
    print("ALL")

if __name__=="__main__":
    main()

I executed below experiments on tpu v2-8, tpu-vm-pt1.10. (on tpu vm) $ accelerate launch main.py -> works well (same with expected output) (on remote) $ ssh <ip> "bash -l -c 'accelerate launch main.py'" -> unexpected output

Master
ALL
Master
ALL
Master
ALL
Master
ALL
Master
ALL
Master
ALL
Master
ALL
Master
ALL

There is no problem when I execute other python/bash scripts via ssh.

Expected behavior

Master
ALL
ALL
ALL
ALL
ALL
ALL
ALL
ALL

Jan 19 '23 06:01 starmpcc

cc @muellerzr

Jan 19 '23 15:01 sgugger

accelerate accelerate copied to clipboard

Different accelerate lauch behavior via ssh command

System Info

Information

Tasks

Reproduction

Expected behavior

accelerate
accelerate copied to clipboard