Jianguo Zhang
Jianguo Zhang
@muellerzr Hi Zachary, sorry for the late reply (Just restore access to TPUs). When I run `accelerate tpu-config` on Step 5, it returns errors: `Failed to execute command on multiple...
@muellerzr Thanks for your instructions! We have tried the above steps, and most commands, such as ` python3 -m torch_xla.distributed.xla_dist` work across pods in V3-32. Only `accelerate tpu-config` does not...
@zehuichen123 Absolutely!
Hi @muellerzr, thanks for your detailed instructions. While we create a new pod following above process or we start from Step 3. It seems that Step 5 still does not...
Which file you are running? set(labels) is used for finding unique labels
@Hypothesis-Z It seems that https://github.com/huggingface/transformers/pull/35157 still does not resolve the issue.