docker-stacks icon indicating copy to clipboard operation
docker-stacks copied to clipboard

Kernel crash when using tensorflow/pytorch notebook image

Open mthiboust opened this issue 4 months ago • 4 comments

What docker image(s) are you using?

pytorch-notebook, tensorflow-notebook

Host OS system

Ubuntu 23.10

Host architecture

x86_64

What Docker command are you running?

docker run -it --rm -p 8888:8888 quay.io/jupyter/tensorflow-notebook:tensorflow-2.16.1 docker run -it --rm -p 8888:8888 quay.io/jupyter/pytorch-notebook:pytorch-2.2.2

How to Reproduce the problem?

It is hard to give a full Minimum Working Example to reproduce the bug because it happens when training a specific DL model on CPU via Keras that is not so easy to fully reduce. It only happens when running my code via the jupyter/tensorflow-notebook and jupyter/pytorch-notebook images (not when I run the code directly on my system).

I have an easy workaround (defining keras loss via a function instead of a class instance) but I thought you will be interested to know about this weird behavior.

See this Keras issue for more context: https://github.com/keras-team/keras/issues/19601

Command output

No response

Expected behavior

No response

Actual behavior

Kernel crashes

Anything else?

My code is run by a jupyterlab server (using the lastest official docker images jupyter/tensorflow-notebook and jupyter/pytorch-notebook from jupyter/docker-stack) and I connect to it via the vscode-jupypter extension.

The crash is caused by the model.fit() call. It happens within a few seconds when using the torch backend, and a bit later with the tensorflow backend (after a few epochs). But there is no explicit error message I can share with you.

According to this link, the root cause could be a buggy installation of tensorflow/pytorch due to mixing pip and conda packages (jupyter official image installs tensorflow via pip while the other packages are installed via mamba/conda)

Latest Docker version

  • [x] I've updated my Docker version to the latest available, and the issue persists

mthiboust avatar Apr 23 '24 21:04 mthiboust