docker-stacks Kernel crash when using tensorflow/pytorch notebook image

Kernel crash when using tensorflow/pytorch notebook image

Open mthiboust opened this issue 4 months ago • 4 comments

What docker image(s) are you using?

pytorch-notebook, tensorflow-notebook

Host OS system

Ubuntu 23.10

Host architecture

x86_64

What Docker command are you running?

docker run -it --rm -p 8888:8888 quay.io/jupyter/tensorflow-notebook:tensorflow-2.16.1 docker run -it --rm -p 8888:8888 quay.io/jupyter/pytorch-notebook:pytorch-2.2.2

How to Reproduce the problem?

It is hard to give a full Minimum Working Example to reproduce the bug because it happens when training a specific DL model on CPU via Keras that is not so easy to fully reduce. It only happens when running my code via the jupyter/tensorflow-notebook and jupyter/pytorch-notebook images (not when I run the code directly on my system).

I have an easy workaround (defining keras loss via a function instead of a class instance) but I thought you will be interested to know about this weird behavior.

See this Keras issue for more context: https://github.com/keras-team/keras/issues/19601

Command output

No response

Expected behavior

No response

Actual behavior

Kernel crashes

Anything else?

My code is run by a jupyterlab server (using the lastest official docker images jupyter/tensorflow-notebook and jupyter/pytorch-notebook from jupyter/docker-stack) and I connect to it via the vscode-jupypter extension.

The crash is caused by the model.fit() call. It happens within a few seconds when using the torch backend, and a bit later with the tensorflow backend (after a few epochs). But there is no explicit error message I can share with you.

According to this link, the root cause could be a buggy installation of tensorflow/pytorch due to mixing pip and conda packages (jupyter official image installs tensorflow via pip while the other packages are installed via mamba/conda)

Latest Docker version

[x] I've updated my Docker version to the latest available, and the issue persists

Apr 23 '24 21:04 mthiboust

docker-stacks docker-stacks copied to clipboard

Kernel crash when using tensorflow/pytorch notebook image

What docker image(s) are you using?

Host OS system

Host architecture

What Docker command are you running?

How to Reproduce the problem?

Command output

Expected behavior

Actual behavior

Anything else?

Latest Docker version

docker-stacks
docker-stacks copied to clipboard