xla
xla copied to clipboard
Colab notebook link is broken
📚 Documentation
https://pytorch.org/xla/release/2.2/index.html#performance-profiling has a link to a Colab notebook that is broken (https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/pytorch-xla-profiling-colab.ipynb)
It looks like the contrib/colab directory doesn't even exist anymore.
I'm having a tough time getting PJRT running in colab, so an example of how to do this would be really helpful.
Oh our bad. Colab is on the old TPU Node architure and does not support any release newer than PyTorch 2.0. Can you try kaggle? @zpcore can you remove the outdated link in our doc?
You can find the kaggle example in https://github.com/pytorch/xla/tree/master/contrib/kaggle
Thanks, I'll check out Kaggle
It seems the Kaggle notebook environment is also somewhat broken. I have no idea if this is an xla issue, a Jax issue, or something else, but here's the error: https://www.kaggle.com/discussions/product-feedback/479523
hmm @will-cromar have you ever run into this
2024-02-24 22:48:13.356169: F external/local_xla/xla/stream_executor/tpu/tpu_library_init_fns.inc:85] TpuUtil_GetXlaPadSizeFromTpuTopology not available in this library.
?
xla/stream_executor/tpu/tpu_library_init_fns.inc
looks like a very outdated libtpu
to me. We dropped support for StreamExecutor on PJRT about a year ago IIRC. Is this on the current Kaggle TPU VM environment?
Yes, it's the TPU VM environment that Kaggle calls "TPU VM v3-8"
Here's an example notebook: https://www.kaggle.com/code/tjohnson/notebookbf52281afd
It looks like there are two issues in your example notebook. First, the torch version isn't updated to 2.2 yet. I just sent a PR to do this: https://github.com/Kaggle/docker-python/pull/1364
Kaggle requires a special build of torch_xla
that has libtpu
bundled. Otherwise, it conflicts with the libtpus installed by JAX and/or TF. These builds are marked with +libtpu
in our release bucket, e.g. https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.2.0%2Blibtpu-cp310-cp310-manylinux_2_28_x86_64.whl
. The normal install path will not work on Kaggle.
For importing transformers
, this is a known issue: https://github.com/pytorch/xla/issues/5625#issuecomment-1743493309
The workaround is to replace the TF TPU package:
!pip uninstall --yes tensorflow
!pip install tensorflow-cpu
Thanks so much for looking into this @will-cromar !
I'll switch the tensorflow versions as recommended.
I noticed that the docker-python PR is failing CI, although I don't have permissions to see the details: https://github.com/Kaggle/docker-python/commits/main/
Thanks for the heads up. I'll work with the Kaggle team to get that image updated.