mars1248

Results 5 issues of mars1248

无法下载,点download pdf没有反应

## 🐛 Bug ## To Reproduce Steps to reproduce the behavior: 1. `PJRT_DEVICE=CUDA python test_train_spmd_imagenet.py --fake_data --batch_size 16 --model=resnet50 --sharding=batch --profile` 2. another process use `xp.trace('localhost:9012', '/tmp/tensorboard')` 3.the main process...

xla:gpu

## ❓ Questions and Help At present, I have passed the single-machine spmd training, but I do not know how to run the multi-machine spmd training. Could you give me...

## ❓ Questions and Help Fsdp can be well expressed by spmd, but hsdp seems to be unable to be expressed. Is there any way to express hsdp in spmd?

## ❓ Questions and Help In my test code, I found that there might be PjRtData as the type argument(the argument is a scalar), and then the core dump. https://github.com/pytorch/xla/blob/master/torch_xla/csrc/runtime/pjrt_computation_client.cc#L806...