tutorials icon indicating copy to clipboard operation
tutorials copied to clipboard

Multiple colab notebooks from the documentation don't work and get stuck on the imports

Open BaruchG opened this issue 3 years ago • 6 comments
trafficstars

🐛 Bug

At a minimum, the notebooks at https://pytorch-lightning.readthedocs.io/en/latest/notebooks/lightning_examples/mnist-hello-world.html and https://pytorch-lightning.readthedocs.io/en/latest/notebooks/lightning_examples/cifar10-baseline.html don't work on colab when going there through the "colab" link at the top of the page. When the first cell with the dependencies is run it returns

torchtext 0.13.0 requires torch==1.12.0, but you have torch 1.8.1 which is incompatible.
torchaudio 0.12.0+cu113 requires torch==1.12.0, but you have torch 1.8.1 which is incompatible.

so it looks like the version numbers will need to be changed to adjust for that. If I try to run the next cell with the imports afterwards, I get OSError: /usr/local/lib/python3.7/dist-packages/torchtext/lib/libtorchtext.so: undefined symbol: _ZTVN5torch3jit6MethodE

To Reproduce

Run the cells in the colab notebooks linked to above.

Expected behavior

For the notebook to be able to run without needing to modify it.

BaruchG avatar Jul 25 '22 21:07 BaruchG

Hi @BaruchG, thank you for reporting this issue! Would you be interested in updating the notebook?

akihironitta avatar Jul 25 '22 21:07 akihironitta

Sure @akihironitta , I'd be happy to. It works if I add "torchtext==0.9.1" to the pip install in the first cell. Is it ok if it's pinned to that version of torchtext?

BaruchG avatar Jul 25 '22 21:07 BaruchG

Is it happening only for the build, or can you reproduce it locally? if locally, mind sharing details about your environment?

Borda avatar Jul 25 '22 23:07 Borda

@Borda If I run the first cell locally, I get:

ERROR: tensorboard 2.9.1 has requirement protobuf<3.20,>=3.9.2, but you'll have protobuf 3.20.1 which is incompatible.
ERROR: torchvision 0.13.0 has requirement torch==1.12.0, but you'll have torch 1.8.1 which is incompatible.

It looks like since Torchvision isn't pinned and torch is it's causing a conflict, at least with that dependency on my system. If I ignore the error and continue the notebook, lightning is installed and I can continue. I"m not sure why colab has issues with other packages and why it ruins the installation for lightning. Pinning torchtext does take care of Colab though, should I do that? My system is:

* CUDA:
	- GPU:
	- available:         False
	- version:           10.2
* Packages:
	- lightning:         None
	- lightning_app:     None
	- numpy:             1.21.6
	- pyTorch_debug:     False
	- pyTorch_version:   1.8.1+cu102
	- pytorch-lightning: 1.6.5
	- tqdm:              4.64.0
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- 
	- processor:         x86_64
	- python:            3.7.6
	- version:           #137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022

BaruchG avatar Jul 28 '22 16:07 BaruchG

Hi @BaruchG, thank you for showing your interest in contributing, and sorry for the delay!

It works if I add "torchtext==0.9.1" to the pip install in the first cell. Is it ok if it's pinned to that version of torchtext?

We will soon be releasing PL 1.7, and PL 1.7 drops PyTorch 1.8 support, so I think that one reasonable fix would be to use a newer version of PyTorch along with PyTorch-related packages, such as torchvision, following the compatibility matrix. The compatibility matrix can be found in the following script of ours which is copied and pasted from their official READMEs: https://github.com/Lightning-AI/lightning/blob/a90ef3b751816fbcc2b7d45efcd38714a4f6c19b/requirements/pytorch/adjust-versions.py#L8-L14

akihironitta avatar Jul 28 '22 18:07 akihironitta

Looking at this again, I think we should update the installation to:

- ! pip install --quiet "seaborn" "pytorch-lightning>=1.4" "ipython[notebook]" "torch>=1.6, <1.9" "pandas" "torchvision" "torchmetrics>=0.6"
+ ! pip install --quiet "seaborn" "pytorch-lightning>=1.4" "ipython[notebook]" "torch>=1.9, <1.13" "pandas" "torchvision" "torchmetrics>=0.6"

or remove the version pinning completely and print out versions so that there will be specific versions in the built notebooks for successful runs:

- ! pip install --quiet "seaborn" "pytorch-lightning>=1.4" "ipython[notebook]" "torch>=1.6, <1.9" "pandas" "torchvision" "torchmetrics>=0.6"
+ ! pip install --quiet "seaborn" "pytorch-lightning" "ipython[notebook]" "torch" "pandas" "torchvision" "torchmetrics"
+ ! pip list | grep torch

@Borda Is there some reason why we pin these versions in notebooks? I'm thinking we might want to unpin these versions so that notebooks are always up-to-date (unless some of the packages must be pinned for compatibility).

akihironitta avatar Jul 28 '22 18:07 akihironitta

Is there some reason why we pin these versions in notebooks? I'm thinking we might want to unpin these versions so that notebooks are always up-to-date (unless some of the packages must be pinned for compatibility).

I think they are pinned as part of the .meta.yaml as you can see for example: so if you want to unpin we shall update it there... :otter: https://github.com/Lightning-AI/tutorials/blob/4c13fe1f922eff0f172b05c01d098308a1c86575/templates/img-classify/.meta.yml#L11-L15

Borda avatar Jan 03 '23 11:01 Borda

This shall be addressed #210 #232 #233 #236 #237

Borda avatar Mar 19 '23 08:03 Borda