amithrm

Results 16 comments of amithrm

@JackCaoG sure..will add tests

@JackCaoG I changed the initialization a bit to take into account how slurm configures the devices. Please take a look at it and also the test cases. All of these...

we did some internal testing. It appears that at scale, we see issues with the set up of GRPC channels. We should understand if you see similar issues at your...

@JackCaoG A simple test that you can run on GPU-XLA: ``` import sys import torch import torch_xla import torch_xla.core.xla_model as xm import torch_xla.distributed.xla_multiprocessing as xmp import os def _mp_fn(index): print('XRT_LOCAL_WORKER:{}'.format(os.environ['XRT_LOCAL_WORKER']))...

Run command: GPU_NUM_DEVICES=2 python3 allreduce_xla.py This will output: XRT_LOCAL_WORKER:localservice:0 XRT_DEVICE_MAP:GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0|GPU:1;/job:localservice/replica:0/task:1/device:XLA_GPU:0 XRT_WORKERS:localservice:0;grpc://dfda805bbe4b:49887|localservice:1;grpc://dfda805bbe4b:33097 XRT_LOCAL_WORKER:localservice:1 XRT_DEVICE_MAP:GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0|GPU:1;/job:localservice/replica:0/task:1/device:XLA_GPU:0 XRT_WORKERS:localservice:0;grpc://dfda805bbe4b:49887|localservice:1;grpc://dfda805bbe4b:33097 If you look for XRT_WORKERS, this has the grpc string for each worker. This won't scale...

A simple code to try on GPUs: ``` import sys import torch import torch_xla import torch_xla.core.xla_model as xm import torch_xla.distributed.xla_multiprocessing as xmp import os def _mp_fn(index): print('XRT_LOCAL_WORKER:{}'.format(os.environ['XRT_LOCAL_WORKER'])) print('XRT_DEVICE_MAP:{}'.format(os.environ['XRT_DEVICE_MAP'])) print('XRT_WORKERS:{}'.format(os.environ['XRT_WORKERS'])) print('XRT_HOST_WORLD_SIZE:{}'.format(os.environ['XRT_HOST_WORLD_SIZE']))...

Run command: GPU_NUM_DEVICES=2 python3 allreduce_xla.py This will output: XRT_LOCAL_WORKER:localservice:0 XRT_DEVICE_MAP:GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0|GPU:1;/job:localservice/replica:0/task:1/device:XLA_GPU:0 XRT_WORKERS:localservice:0;grpc://dfda805bbe4b:49887|localservice:1;grpc://dfda805bbe4b:33097 XRT_LOCAL_WORKER:localservice:1 XRT_DEVICE_MAP:GPU:0;/job:localservice/replica:0/task:0/device:XLA_GPU:0|GPU:1;/job:localservice/replica:0/task:1/device:XLA_GPU:0 XRT_WORKERS:localservice:0;grpc://dfda805bbe4b:49887|localservice:1;grpc://dfda805bbe4b:33097 If you look for XRT_WORKERS, this has the grpc string for each worker. This won't scale...

Hi Jack, thanks for the pointers! I went over the code flow. The xmp.spawn() code pasted above takes the same path as that of GPU_NUM_DEVICES. In my understanding, (I will...

Looks like the file need to test (allreduce_torchrun.py) is not getting picked up. Checking with @will-cromar on how to fix this. And some yapf fixes are pending in one file.

@will-cromar I see build failure: NameError: name 'sympy' is not defined