alias-free-gan icon indicating copy to clipboard operation
alias-free-gan copied to clipboard

TPU issues in Colab

Open vsemecky opened this issue 3 years ago • 1 comments

I'd like to thank you in advance for the work you're doing.

I tried the code with TPU and ran into two problems that may be related. Both errors only occur with the TPU instance, the GPU instances are OK. And it is not a priority for me, I just wanted to report what I found.

1) python install.py prints the following error at the end:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behavior is the source of the following dependency conflicts.
earthengine-api 0.1.278 requires google-api-python-client<2,>=1.12.1, but you have google-api-python-client 1.8.0 which is incompatible. Successfully installed cloud-tpu-client-0.10 google-api-python-client-1.8.0 torch-xla-1.9.1

2) trainer.py ends with the following error even if I just run python scripts/trainer.py --help

Traceback (most recent call last):
  File "scripts/trainer.py", line 10, in <module>
    import pytorch_lightning as pl
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in <module>
    from pytorch_lightning.metrics.utils import deprecated_metrics, void
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/utils.py", line 29, in <module>
    from pytorch_lightning.utilities import rank_zero_deprecation
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
    from pytorch_lightning.utilities.apply_func import move_data_to_device  # noqa: F401
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 27, in <module>
    from pytorch_lightning.utilities.imports import _compare_version, _TORCHTEXT_AVAILABLE
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/imports.py", line 93, in <module>
    from pytorch_lightning.utilities.xla_device import XLADeviceUtils  # noqa: E402
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/xla_device.py", line 23, in <module>
    import torch_xla.core.xla_model as xm
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/__init__.py", line 101, in <module>
    import _XLAC
ImportError: /usr/local/lib/python3.7/dist-packages/_XLAC.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8data_ptrIN3c107complexIfEEEEPT_v

vsemecky avatar Sep 13 '21 16:09 vsemecky

Haven't played around with TPUs in awhile. Might check it out when I have some extra time but you have to set the TPU enviroment variables. Try running the following cell:

import os
with open('./scripts/tpu_setup.sh') as f:
    os.environ.update(
        line.replace('export ', '', 1).strip().split('=', 1) for line in f
        if 'export' in line
)

duskvirkus avatar Sep 16 '21 23:09 duskvirkus