tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

mnist-tpu-training.ipynb tutorial has incorrect dependencies

Open Wattsy2020 opened this issue 2 years ago • 1 comments

🐛 Bug

The MNIST training using TPU tutorial has incorrect dependencies. Running all cells from start to finish in Colab results in an error when trying to import pytorch lightning.

To Reproduce

Steps to reproduce the behavior:

  1. Go to https://github.com/PyTorchLightning/lightning-tutorials/blob/publication/.notebooks/lightning_examples/mnist-tpu-training.ipynb
  2. Click "copy raw contents", paste into a text file and save with extension ".ipynb"
  3. Upload to Colab https://colab.research.google.com/
  4. Click "runtime" -> "Change runtime type" and select TPU as the Hardware accelerator
  5. Click "runtime" -> "run all"
  6. An error will occur when running the 3rd cell

Here is the output of the 3rd import cell.

WARNING:root:Waiting for TPU to be start up with version pytorch-1.8...
WARNING:root:Waiting for TPU to be start up with version pytorch-1.8...
WARNING:root:TPU has started up successfully with version pytorch-1.8
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-9ad40618d134> in <module>()
      1 import torch
      2 import torch.nn.functional as F
----> 3 from pytorch_lightning import LightningDataModule, LightningModule, Trainer
      4 from torch import nn
      5 from torch.utils.data import DataLoader, random_split

9 frames
/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py in <module>()
     18 _PROJECT_ROOT = os.path.dirname(_PACKAGE_ROOT)
     19 
---> 20 from pytorch_lightning.callbacks import Callback  # noqa: E402
     21 from pytorch_lightning.core import LightningDataModule, LightningModule  # noqa: E402
     22 from pytorch_lightning.trainer import Trainer  # noqa: E402

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/__init__.py in <module>()
     12 # See the License for the specific language governing permissions and
     13 # limitations under the License.
---> 14 from pytorch_lightning.callbacks.base import Callback
     15 from pytorch_lightning.callbacks.device_stats_monitor import DeviceStatsMonitor
     16 from pytorch_lightning.callbacks.early_stopping import EarlyStopping

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/base.py in <module>()
     24 
     25 import pytorch_lightning as pl
---> 26 from pytorch_lightning.utilities.types import STEP_OUTPUT
     27 
     28 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py in <module>()
     16 import numpy
     17 
---> 18 from pytorch_lightning.utilities.apply_func import move_data_to_device  # noqa: F401
     19 from pytorch_lightning.utilities.distributed import AllGatherGrad, rank_zero_info, rank_zero_only  # noqa: F401
     20 from pytorch_lightning.utilities.enums import (  # noqa: F401

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py in <module>()
     27 
     28 if _TORCHTEXT_AVAILABLE:
---> 29     if _compare_version("torchtext", operator.ge, "0.9.0"):
     30         from torchtext.legacy.data import Batch
     31     else:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/imports.py in _compare_version(package, op, version, use_base_version)
     52     """
     53     try:
---> 54         pkg = importlib.import_module(package)
     55     except (ModuleNotFoundError, DistributionNotFound):
     56         return False

/usr/lib/python3.7/importlib/__init__.py in import_module(name, package)
    125                 break
    126             level += 1
--> 127     return _bootstrap._gcd_import(name[level:], package, level)
    128 
    129 

/usr/local/lib/python3.7/dist-packages/torchtext/__init__.py in <module>()
      3 from . import datasets
      4 from . import utils
----> 5 from . import vocab
      6 from . import legacy
      7 from ._extension import _init_extension

/usr/local/lib/python3.7/dist-packages/torchtext/vocab/__init__.py in <module>()
      9 )
     10 
---> 11 from .vocab_factory import (
     12     vocab,
     13     build_vocab_from_iterator,

/usr/local/lib/python3.7/dist-packages/torchtext/vocab/vocab_factory.py in <module>()
      2 from typing import Dict, Iterable, Optional, List
      3 from collections import Counter, OrderedDict
----> 4 from torchtext._torchtext import (
      5     Vocab as VocabPybind,
      6 )

ImportError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZTVN5torch3jit6MethodE

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

Expected behavior

The 3rd cell and the remaining notebook complete without error.

Additional context

This occurs because torchtext is now version 0.11.0 by default in Colab, but XLA strictly requires torch and torchaudio, torchtext, torchvision have the same version as torch_xla.

The error can be fixed by changing the line ! pip install --quiet "pytorch-lightning>=1.3" "torchmetrics>=0.3" "torch>=1.6, <1.9" "torchvision" To ! pip install --quiet "pytorch-lightning>=1.3" "torchmetrics>=0.3" "torch==1.8.0" "torchvision==0.9.0" "torchaudio==0.8.0" "torchtext==0.9.0" (Then doing a factory reset of the runtime if you ran the previous bugged code)

This explicitly installs the correct versions of the torch libraries. I'm not certain about how to create a pull request for this myself. I converted my notebook using Jupytext and it seems adding the following lines

# ## Setup
# This notebook requires some packages besides pytorch-lightning.

# %% colab={"base_uri": "https://localhost:8080/"} id="37f8b49a"
# ! pip install --quiet "pytorch-lightning>=1.3" "torchmetrics>=0.3" "torch==1.8.0" "torchvision==0.9.0" "torchaudio==0.8.0" "torchtext==0.9.0"

To the beginning of https://github.com/PyTorchLightning/lightning-tutorials/blob/main/lightning_examples/mnist-tpu-training/mnist-tpu.py should fix the issue. However I'm not sure why there isn't a "Setup" section there already, is there something about the CI system that I'm missing?

Wattsy2020 avatar Dec 23 '21 11:12 Wattsy2020

hi there, if I read it correctly the problem is that the wider packages' version ranges are resolved correctly, meaning that, so pin some specific version would help or use a variation of this version mapping from PL: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/requirements/adjust_versions.py

Borda avatar Jan 12 '22 08:01 Borda