jonb377

Results 29 comments of jonb377

@will-cromar Yep! I was able to run the [MNIST example](https://github.com/pytorch/examples/blob/main/mnist/main.py) to convergence using the XLA device, and verified GPU utilization with nvidia-smi. Currently looking into adding a CI test, I'll...

The CI failure is from `test_index_select_0dim`. I'm able to replicate locally, and it looks like the tensor's `size` isn't being preserved after `index_select` when on the XLA device, e.g.: ```...

Rebase after the TF pin update seems to have fixed it.

This is still an issue with the latest `fsspec==2023.9.2` and `gcsfs==2023.9.2`, is there a workaround or are there plans to address this? Our script uses `url_to_fs` to keep everything behind...

Hey Martin, thanks for the quick response! Unfortunately yes, here is the script I ran: ```python import fsspec fs = fsspec.filesystem('gcs') with fs.transaction: with fs.open('/foo', 'wb') as f: f.write(b'This is...

Wow thank you @martindurant, this works perfectly! Really appreciate the quick turnaround on this 😄

@martindurant one last question - what does the release timeline look like for the fix?

Hey @mfatih7! The checkpoint format is different between normal checkpointing and distributed checkpointing. The two are not cross-compatible. To use `xser.save` checkpoints with SPMD, you'll need to load the checkpoint...

HI @mfatih7, could you share some more details about the error? It sounds like you're getting different results on TPU compared to CPU. This can be expected to a degree,...

@JackCaoG is such a difference expected between TPU and CPU? @mfatih7 Do you know if this is also the case for non-SPMD TPU? If you can share a minimal repro...