Import error on shutdown/KeyboardInterrupt if ran from Jupyter Lab notebook cell
Bug description
Import error on shutdown/KeyboardInterrupt if ran from Jupyter Lab notebook cell. If ran from script everything works fine.
What version are you seeing the problem on?
v2.4
How to reproduce the bug
Run trainer.fit from a Jupyter notebook cell, then click stop in Jupyter notebook.
print("---start train---")
trainer.fit(model, train_dataloader, ckpt_path=ckpt_path)
Error messages and logs
Detected KeyboardInterrupt, attempting graceful shutdown ...
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
~/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
45 if trainer.strategy.launcher is not None:
---> 46 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
47 return trainer_fn(*args, **kwargs)
~/.local/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/multiprocessing.py in launch(self, function, trainer, *args, **kwargs)
143 self.procs = process_context.processes
--> 144 while not process_context.join():
145 pass
~/.local/lib/python3.10/site-packages/torch/multiprocessing/spawn.py in join(self, timeout)
117 # Wait for any process to fail or all of them to succeed.
--> 118 ready = multiprocessing.connection.wait(
119 self.sentinels.keys(),
/usr/lib/python3.10/multiprocessing/connection.py in wait(object_list, timeout)
930 while True:
--> 931 ready = selector.select(timeout)
932 if ready:
/usr/lib/python3.10/selectors.py in select(self, timeout)
415 try:
--> 416 fd_event_list = self._selector.poll(timeout)
417 except InterruptedError:
KeyboardInterrupt:
During handling of the above exception, another exception occurred:
NameError Traceback (most recent call last)
/tmp/ipykernel_2824/3752444865.py in <module>
189 ckpt_path = None
190 print("---start train---")
--> 191 trainer.fit(model, train_dataloader, ckpt_path=ckpt_path)
~/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
536 self.state.status = TrainerStatus.RUNNING
537 self.training = True
--> 538 call._call_and_handle_interrupt(
539 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
540 )
~/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
62 if isinstance(launcher, _SubprocessScriptLauncher):
63 launcher.kill(_get_sigkill_signal())
---> 64 exit(1)
65
66 except BaseException as exception:
NameError: name 'exit' is not defined
Environment
Current environment
- CUDA:
- GPU:
- NVIDIA A100-SXM4-40GB
- available: True
- version: 12.1
- GPU:
- Lightning:
- lightning: 2.4.0
- lightning-utilities: 0.11.7
- pytorch-lightning: 2.4.0
- torch: 2.4.1
- torch-summary: 1.4.5
- torchmetrics: 1.4.2
- torchvision: 0.15.2
- Packages:
- absl-py: 0.15.0
- aiohappyeyeballs: 2.4.3
- aiohttp: 3.10.8
- aiosignal: 1.3.1
- aiosqlite: 0.19.0
- annotated-types: 0.6.0
- anyio: 4.1.0
- appdirs: 1.4.4
- argon2-cffi: 21.1.0
- arrow: 1.3.0
- astunparse: 1.6.3
- async-lru: 2.0.4
- async-timeout: 4.0.3
- attrs: 23.1.0
- automat: 20.2.0
- babel: 2.13.1
- backcall: 0.2.0
- bcrypt: 3.2.0
- beautifulsoup4: 4.10.0
- beniget: 0.4.1
- bleach: 4.1.0
- blinker: 1.4
- bottle: 0.12.19
- bottleneck: 1.3.2
- brotli: 1.0.9
- cachetools: 5.0.0
- certifi: 2020.6.20
- cffi: 1.15.0
- chardet: 4.0.0
- charset-normalizer: 3.3.2
- click: 8.0.3
- cloud-init: 23.3.3
- colorama: 0.4.4
- comm: 0.2.0
- command-not-found: 0.3
- configobj: 5.0.6
- constantly: 15.1.0
- cryptography: 3.4.8
- ctop: 1.0.0
- cycler: 0.11.0
- dacite: 1.8.1
- dbus-python: 1.2.18
- debugpy: 1.8.0
- decorator: 4.4.2
- defusedxml: 0.7.1
- distlib: 0.3.4
- distro: 1.7.0
- distro-info: 1.1+ubuntu0.1
- docker: 5.0.3
- entrypoints: 0.4
- et-xmlfile: 1.0.1
- exceptiongroup: 1.2.0
- fastjsonschema: 2.19.0
- filelock: 3.6.0
- flake8: 4.0.1
- flatbuffers: 1.12.1-git20200711.33e2d80-dfsg1-0.6
- fonttools: 4.29.1
- fqdn: 1.5.1
- frozenlist: 1.4.1
- fs: 2.4.12
- fsspec: 2024.9.0
- future: 0.18.2
- gast: 0.5.2
- glances: 3.2.4.2
- google-auth: 1.5.1
- google-auth-oauthlib: 0.4.2
- google-pasta: 0.2.0
- grpcio: 1.30.2
- h5py: 3.6.0
- h5py.-debian-h5py-serial: 3.6.0
- html5lib: 1.1
- htmlmin: 0.1.12
- httplib2: 0.20.2
- huggingface-hub: 0.25.1
- hyperlink: 21.0.0
- icdiff: 2.0.4
- idna: 3.3
- imagehash: 4.3.1
- importlib-metadata: 4.6.4
- incremental: 21.3.0
- influxdb: 5.3.1
- iniconfig: 1.1.1
- iotop: 0.6
- ipykernel: 6.7.0
- ipython: 7.31.1
- ipython-genutils: 0.2.0
- ipywidgets: 8.1.1
- isoduration: 20.11.0
- jax: 0.4.14
- jaxlib: 0.4.14
- jdcal: 1.0
- jedi: 0.18.0
- jeepney: 0.7.1
- jinja2: 3.0.3
- joblib: 0.17.0
- json5: 0.9.14
- jsonpatch: 1.32
- jsonpointer: 2.0
- jsonschema: 4.20.0
- jsonschema-specifications: 2023.11.2
- jupyter-client: 8.6.0
- jupyter-console: 6.4.0
- jupyter-core: 5.5.0
- jupyter-events: 0.9.0
- jupyter-lsp: 2.2.1
- jupyter-server: 2.12.0
- jupyter-server-fileid: 0.9.0
- jupyter-server-terminals: 0.4.4
- jupyter-ydoc: 1.1.1
- jupyterlab: 4.0.9
- jupyterlab-pygments: 0.1.2
- jupyterlab-server: 2.25.2
- jupyterlab-widgets: 3.0.9
- kaptan: 0.5.12
- keras: 2.13.1
- keyring: 23.5.0
- kiwisolver: 1.3.2
- launchpadlib: 1.10.16
- lazr.restfulclient: 0.14.4
- lazr.uri: 1.0.6
- libtmux: 0.10.1
- lightning: 2.4.0
- lightning-utilities: 0.11.7
- llvmlite: 0.41.1
- lxml: 4.8.0
- lz4: 3.1.3+dfsg
- markdown: 3.3.6
- markupsafe: 2.0.1
- matplotlib: 3.5.1
- matplotlib-inline: 0.1.3
- mccabe: 0.6.1
- mistune: 3.0.2
- ml-dtypes: 0.2.0
- more-itertools: 8.10.0
- mpmath: 0.0.0
- msgpack: 1.0.3
- multidict: 6.1.0
- multimethod: 1.10
- nbclient: 0.5.6
- nbconvert: 7.12.0
- nbformat: 5.9.2
- nest-asyncio: 1.5.4
- netifaces: 0.11.0
- networkx: 2.4
- nose: 1.3.7
- notebook: 6.4.8
- notebook-shim: 0.2.3
- numba: 0.58.1
- numexpr: 2.8.1
- numpy: 1.25.2
- nvidia-cublas-cu12: 12.1.3.1
- nvidia-cuda-cupti-cu12: 12.1.105
- nvidia-cuda-nvrtc-cu12: 12.1.105
- nvidia-cuda-runtime-cu12: 12.1.105
- nvidia-cudnn-cu12: 9.1.0.70
- nvidia-cufft-cu12: 11.0.2.54
- nvidia-curand-cu12: 10.3.2.106
- nvidia-cusolver-cu12: 11.4.5.107
- nvidia-cusparse-cu12: 12.1.0.106
- nvidia-ml-py3: 7.352.0
- nvidia-nccl-cu12: 2.20.5
- nvidia-nvjitlink-cu12: 12.6.77
- nvidia-nvtx-cu12: 12.1.105
- oauthlib: 3.2.0
- odfpy: 1.4.2
- olefile: 0.46
- openpyxl: 3.0.9
- opt-einsum: 3.3.0
- overrides: 7.4.0
- packaging: 21.3
- pandas: 1.3.5
- pandas-profiling: 3.6.6
- pandocfilters: 1.5.0
- parso: 0.8.1
- patsy: 0.5.4
- pexpect: 4.8.0
- phik: 0.12.3
- pickleshare: 0.7.5
- pillow: 9.0.1
- pip: 23.3.1
- platformdirs: 2.5.1
- pluggy: 0.13.0
- ply: 3.11
- prometheus-client: 0.9.0
- prompt-toolkit: 3.0.28
- protobuf: 4.21.12
- psutil: 5.9.0
- ptyprocess: 0.7.0
- py: 1.10.0
- pyasn1: 0.4.8
- pyasn1-modules: 0.2.1
- pycodestyle: 2.8.0
- pycparser: 2.21
- pycryptodomex: 3.11.0
- pydantic: 2.5.2
- pydantic-core: 2.14.5
- pyflakes: 2.4.0
- pygments: 2.11.2
- pygobject: 3.42.1
- pyhamcrest: 2.0.2
- pyinotify: 0.9.6
- pyjwt: 2.3.0
- pyopenssl: 21.0.0
- pyparsing: 2.4.7
- pyrsistent: 0.18.1
- pyserial: 3.5
- pysmi: 0.3.2
- pysnmp: 4.4.12
- pystache: 0.6.0
- pytest: 6.2.5
- python-apt: 2.4.0+ubuntu2
- python-dateutil: 2.8.2
- python-debian: 0.1.43+ubuntu1.1
- python-json-logger: 2.0.7
- python-magic: 0.4.24
- pythran: 0.10.0
- pytorch-lightning: 2.4.0
- pytz: 2022.1
- pywavelets: 1.5.0
- pyyaml: 5.4.1
- pyzmq: 25.1.2
- referencing: 0.31.1
- regex: 2024.9.11
- requests: 2.31.0
- requests-oauthlib: 1.3.0
- rfc3339-validator: 0.1.4
- rfc3986-validator: 0.1.1
- rpds-py: 0.13.2
- rsa: 4.8
- safetensors: 0.4.5
- scikit-learn: 0.23.2
- scipy: 1.8.0
- seaborn: 0.12.2
- secretstorage: 3.3.1
- send2trash: 1.8.2
- service-identity: 18.1.0
- setuptools: 59.6.0
- simplejson: 3.17.6
- six: 1.16.0
- sniffio: 1.3.0
- sos: 4.5.6
- soupsieve: 2.3.1
- ssh-import-id: 5.11
- statsmodels: 0.14.0
- sympy: 1.9
- systemd-python: 234
- tables: 3.7.0
- tangled-up-in-unicode: 0.2.0
- tensorboard: 2.13.0
- tensorflow: 2.13.1
- tensorflow-estimator: 2.13.0
- termcolor: 1.1.0
- terminado: 0.13.1
- testpath: 0.5.0
- threadpoolctl: 3.1.0
- tinycss2: 1.2.1
- tmuxp: 1.9.2
- tokenizers: 0.20.0
- toml: 0.10.2
- tomli: 2.0.1
- torch: 2.4.1
- torch-summary: 1.4.5
- torchmetrics: 1.4.2
- torchvision: 0.15.2
- tornado: 6.4
- tqdm: 4.66.1
- traitlets: 5.14.0
- transformers: 4.45.1
- triton: 3.0.0
- twisted: 22.1.0
- typeguard: 4.1.5
- types-python-dateutil: 2.8.19.14
- typing-extensions: 4.8.0
- ubuntu-advantage-tools: 8001
- ufolib2: 0.13.1
- ufw: 0.36.1
- unattended-upgrades: 0.1
- unicodedata2: 14.0.0
- uri-template: 1.3.0
- urllib3: 1.26.5
- virtualenv: 20.13.0+ds
- visions: 0.7.5
- wadllib: 1.3.6
- wcwidth: 0.2.5
- webcolors: 1.13
- webencodings: 0.5.1
- websocket-client: 1.2.3
- werkzeug: 2.0.2
- wheel: 0.37.1
- widgetsnbextension: 4.0.9
- wordcloud: 1.9.2
- wrapt: 1.13.3
- xlwt: 1.3.0
- y-py: 0.6.2
- yarl: 1.13.1
- ydata-profiling: 4.6.3
- ypy-websocket: 0.12.4
- zipp: 1.0.0
- zope.interface: 5.4.0
- System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.10.12
- release: 6.2.0-37-generic
- version: #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 2 18:01:13 UTC 2
#- PyTorch Lightning Version (e.g., 2.4.0): 2.4.0
#- PyTorch Version (e.g., 2.4): 2.4.1+cu121
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration: 1xA100 40GB
#- How you installed Lightning(`conda`, `pip`, source): pip install lightning
More info
No response
Avoid exit(1): In a Jupyter environment, exit() can cause problems. exit is possible in standard Python scripts, but should not be called in Jupyter notebooks. You can use sys.exit() instead: import sys sys.exit(1)
However, the recommended approach is to avoid using exit() or sys.exit() directly, especially in Jupyter notebook environments, where these commands can interrupt the kernel process and cause unnecessary problems.
@nocoding03 My code/notebook does not use or calls exit. The problem is in the pytroch lightning module.
If you will double-check the provided traceback, you will see that the error comes from ~/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py module.
I also see that issue in lightning v2.4.0 and torch v2.5.1 while training in jupyter nb. Once stopping the training run, instead of performing gracefully shutdown, I get this error
NameError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
62 if isinstance(launcher, _SubprocessScriptLauncher):
63 launcher.kill(_get_sigkill_signal())
---> 64 exit(1)
65
66 except BaseException as exception:
NameError: name ‘exit’ is not defined
seems to be an issue with lightning not importing exit from sys (exit(0)) not defined
same issue in 2.5.0 - but it even fails when defining the trainer and kills the kernel
What's the status of this? The bug was reported 5 months ago in that specific branch https://github.com/Lightning-AI/pytorch-lightning/pull/19976 authored by @awaelchli and approved by @lantiga. There seems to be no activity in fixing this. My understanding is that importing exit from sys should be sufficient to fix it but I might miss something.
Use this by now:
try:
trainer.fit(model, train_loader, val_loader)
except NameError as e:
import gc
gc.collect()
torch.cuda.empty_cache()