clipper How to debug cause of "ImportError when running Container"

I am a bit stuck: I'm trying to deploy a pytorch model which runs fine on my local computer but hangs when trying to deploy it. The model's docker container gives me this error log:

$ docker logs 4437076d479c
Starting PyTorchContainer container
Connecting to Clipper with default port: 7000
Encountered an ImportError when running container. You can use the pkgs_to_install argument when calling clipper_admin.build_model() to supply any needed Python packages.

It seems like it is not able to import some dependency. I installed the same model in a clean anaconda environment on my computer and checked exactly which dependencies I needed to install. I listed these in pkgs_to_install. I have inspected all the log files of the docker containers and I can't find the actual error message which could give me a hint as to which dependency is missing. May some error log be getting swallowed somewhere?

The model I'm trying to run is the xinntao/ESRGAN for image super-resolution.

Dependencies: Python 3 PyTorch >= 0.4.0 Python packages: pip install numpy opencv-python

The command to deploy the model:

deploy_pytorch_model(
    clipper_conn,
    name="superresolution-model",
    version=1,
    input_type="bytes",
    func=image_enhance,
    pytorch_model=model,
    pkgs_to_install=['opencv-python','numpy','six', 'Pillow','wheel',]
    )

the clipper_service.py is where I define the model and try to deploy.

Jan 04 '19 15:01 voodoohop

Hi Thanks for raising this issue.

The import error is because the script is importing architecture.py and architecture.py is importing block.py. Cloupickle (package we use to serialize function) is not able to pick up and bundle these scripts directly. Basically, when we deserialize the script in container, it will run something like import architecture as arch and then architecture will import block as B.

If you consolidate all three files architecture.py, block.py, and clipper_service.py into a single clipper_service file, you should be good to go.

In few days, we will add a feature where you can directly deploy a onnx model (which you can directly export from pytorch using on line of code). This will be much easier.

Jan 04 '19 16:01 simon-mo

Hello, @voodoohop. Happy new year! First of all, thank you for your error report. Clipper installs the PyTorch library from pypi, but pypi has not been updated since Mar 11, 2017 and the 0.1.2 version is the last. Because xinntao/ESRGAN has 'PyTorch >= 0.4.0' dependency, you have to create a custom model container to support the latest PyTorch. Please refer to this link.

For example, if you prefer Python 3.6, no CUDA environment,

Create a Dockerfile like this,

FROM clipper/python36-closure-container:0.3

RUN pip install https://download.pytorch.org/whl/cpu/torch-1.0.0-cp36-cp36m-linux_x86_64.whl torchvision

Build a Docker image.

$ docker build -t custom-model-image .

Deploy your model with base_image,

deploy_pytorch_model(
    clipper_conn,
    name="superresolution-model",
    version=1,
    input_type="bytes",
    func=image_enhance,
    pytorch_model=model,
    base_image='custom-model-image',
    pkgs_to_install=['opencv-python','numpy','six', 'Pillow','wheel',]
    )

Jan 04 '19 16:01 withsmilo

Also

http://clipper.ai/tutorials/custom_model_container/

On Fri, Jan 4, 2019 at 10:33 AM Sungjun.Kim [email protected] wrote:

Hello, @voodoohop https://github.com/voodoohop. Happy new year! First of all, thank you for your error report. Clipper installs the PyTorch library from pypi https://pypi.org/project/pytorch/, but pypi has not been updated since Mar 11, 2017 and the 0.1.2 version is the last. Because xinntao/ESRGAN has 'PyTorch >= 0.4.0' dependency, you have to create Clipper has to create custom model container to support the latest PyTorch. Please refer to this link http://clipper.ai/tutorials/custom_model_container/.

For example, if you prefer Python 3.6, no CUDA environment,

Create a Dockerfile like this,

FROM clipper/python36-closure-container:0.3

RUN pip install https://download.pytorch.org/whl/cpu/torch-1.0.0-cp36-cp36m-linux_x86_64.whl torchvision

Build a image

$ docker build -t custom-model-image .

Deploy your model with base_image,

deploy_pytorch_model( clipper_conn, name="superresolution-model", version=1, input_type="bytes", func=image_enhance, pytorch_model=model, base_image='custom-model-image', pkgs_to_install=['opencv-python','numpy','six', 'Pillow','wheel',] )

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/ucbrise/clipper/issues/611#issuecomment-451494953, or mute the thread https://github.com/notifications/unsubscribe-auth/AUI_gyOw2gzK-wt9_aargK2l_YtJMjrSks5u_4JLgaJpZM4Zp4fF .

Jan 04 '19 16:01 simon-mo

The import error is because the script is importing architecture.py and architecture.py is importing block.py. Cloupickle (package we use to serialize function) is not able to pick up and bundle these scripts directly. Basically, when we deserialize the script in container, it will run something like import architecture as arch and then architecture will import block as B.

If you consolidate all three files architecture.py, block.py, and clipper_service.py into a single clipper_service file, you should be good to go.

Thank you guys for your fast response... @simon-mo I had actually tried consolidating those 3 files into one and had not gotten any further. But it's good to know I need to do that. I still haven't completely wrapped my head around what gets run where.

I will try building a custom model container with the latest pytorch and consolidate the files into one and let you know if it works. Thanks!

Jan 04 '19 16:01 voodoohop

To be a bit more clear, if you are building custom container, you don’t need to consolidate the files because when building custom containers, you can just copy the files over.

Jan 04 '19 16:01 simon-mo

Ok so I created a Dockerfile

FROM clipper/python36-closure-container:0.3

RUN pip install https://download.pytorch.org/whl/cpu/torch-1.0.0-cp36-cp36m-linux_x86_64.whl torchvision opencv-python numpy six Pillow wheel certifi

Then created the custom-model-image.

I also consildated architecture.py and block.py into clipper_service.py

Tried deploying the model with this command:

deploy_pytorch_model(
    clipper_conn,
    name="superresolution-model",
    version=1,
    input_type="bytes",
    func=image_enhance,
    pytorch_model=model,
    base_image='custom-model-image',
    pkgs_to_install=['opencv-python','numpy','six', 'Pillow','wheel',]
    )

But still the same error. I also tried copying the 3 python files to the Docker container.

Is there no way to get a more verbose error output than just the "ImportError when running Container"?

Here is the console output:

python3 clipper_service.py 
Model path models/RRDB_ESRGAN_x4.pth. 
Testing...
19-01-04:18:36:05 INFO     [clipper_admin.py:1258] Stopped all Clipper cluster and all model containers
19-01-04:18:36:05 INFO     [docker_container_manager.py:119] Starting managed Redis instance in Docker
19-01-04:18:36:12 INFO     [clipper_admin.py:126] Clipper is running
19-01-04:18:36:12 INFO     [clipper_admin.py:201] Application superresolution was successfully registered
going to deploy...
19-01-04:18:36:12 INFO     [deployer_utils.py:44] Saving function to /tmp/clipper/tmptfqw7oxn
19-01-04:18:36:12 INFO     [deployer_utils.py:54] Serialized and supplied predict function
19-01-04:18:36:14 INFO     [pytorch.py:204] Torch model saved
19-01-04:18:36:14 INFO     [clipper_admin.py:452] Building model Docker image with model data from /tmp/clipper/tmptfqw7oxn
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Step 1/3 : FROM custom-model-image'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': '\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': ' ---> 24d153fccc69\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Step 2/3 : COPY /tmp/clipper/tmptfqw7oxn /model/'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': '\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': ' ---> ba63ee1aef7b\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Step 3/3 : RUN apt-get -y install build-essential && pip install opencv-python numpy six Pillow wheel'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': '\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': ' ---> Running in 1dba722e12ec\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Reading package lists...'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': '\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Building dependency tree...'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': '\nReading state information...'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': '\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'build-essential is already the newest version (12.3).\n0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Requirement already satisfied: opencv-python in /usr/local/lib/python3.6/site-packages (3.4.5.20)\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Requirement already satisfied: numpy in /usr/local/lib/python3.6/site-packages (1.14.3)\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (1.11.0)\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Requirement already satisfied: Pillow in /usr/local/lib/python3.6/site-packages (5.4.0)\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Requirement already satisfied: wheel in /usr/local/lib/python3.6/site-packages (0.31.0)\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': "\x1b[91mYou are using pip version 10.0.1, however version 18.1 is available.\nYou should consider upgrading via the 'pip install --upgrade pip' command.\n\x1b[0m"}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': ' ---> 256dc60d76f8\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'aux': {'ID': 'sha256:256dc60d76f8785398a8167b1fbc36d9ace79f3811d6879b9ca709e2d3abdae9'}}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Successfully built 256dc60d76f8\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:456] {'stream': 'Successfully tagged superresolution-model:1\n'}
19-01-04:18:36:24 INFO     [clipper_admin.py:458] Pushing model Docker image to superresolution-model:1
19-01-04:18:36:25 INFO     [docker_container_manager.py:257] Found 0 replicas for superresolution-model:1. Adding 1

Jan 04 '19 17:01 voodoohop

is it possible to push superresolution-model:1to the dockerhub? To do that,

`docker tag superresolution-model:1 <docker_user_name>/superresolution-model:1

Jan 04 '19 18:01 simon-mo

Here it is: https://cloud.docker.com/repository/docker/voodoohop/superresolution-model

thanks

Jan 04 '19 19:01 voodoohop

The issue is opencv:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 908, in subimport
    __import__(name)
  File "/usr/local/lib/python3.6/site-packages/cv2/__init__.py", line 3, in <module>
    from .cv2 import *
ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory

opencv2 requires several native packages to be used. The fix is to add the following line to your custom model dockerfile

RUN apt-get update && apt-get install -y libsm6 libxext6 libxrender1 libglib2.0-0

Jan 04 '19 19:01 simon-mo

That solved the problem! Thank you so much.

Is there any log file I could have found that error?

Jan 04 '19 21:01 voodoohop

Not yet. I have to go inside the container and try to load the function myself to see the issue.

To do that, I did:

docker run -it --entrypoint "/bin/bash" superresolution-model:1

and then in a python shell

>>> import cloudpickle
>>> func = cloudpickle.load(open('/model/func.pkl','rb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 908, in subimport
    __import__(name)
  File "/usr/local/lib/python3.6/site-packages/cv2/__init__.py", line 3, in <module>
    from .cv2 import *
ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory

We are adding more logging and debugging support in the next few release, there are people actively working on this.

Jan 05 '19 01:01 simon-mo

Ok! Once I have a better understanding of how the containers work I could try and contribute a fix too but it will still take a little. Thanks again

Jan 05 '19 08:01 voodoohop

Hi Thanks for raising this issue.

The import error is because the script is importing architecture.py and architecture.py is importing block.py. Cloupickle (package we use to serialize function) is not able to pick up and bundle these scripts directly. Basically, when we deserialize the script in container, it will run something like import architecture as arch and then architecture will import block as B.

If you consolidate all three files architecture.py, block.py, and clipper_service.py into a single clipper_service file, you should be good to go.

In few days, we will add a feature where you can directly deploy a onnx model (which you can directly export from pytorch using on line of code). This will be much easier.

What are the limitations of cloudpickle exactly? will from architecture import arch always cause it to fail? or will it only fail if architecture.py imports other blocks from another local file?

I'm trying to let data scientists deploy models from experiment repos and would like to avoid having the nn architecture definitions and clipper deployments in one giant file.

edit: figured this out on my own through testing. Looks like cloudpickle doesn't handle any local imports at all. In order to avoid having mega-deploy files with architectures redefined in them we have to either:

copy the architecture files into the docker container manually.
include architecture definitions in a separate package and pip install that package with the pkgs_to_install arg on the deployment functions. (this sidesteps the need for custom docker containers but makes versioning more awkward as this extra package must be maintained.

onnx isn't a clean solution for us since we lose the ability to do inference on mixed sized inputs. still looking for a more elegant solution.

Feb 21 '19 00:02 A-Jacobson

@simon-mo @withsmilo It is a little uncomfortable to handle some packages in the current Clipper. We can probably improve this part later? I will label this issue as check-again.

May 28 '19 04:05 rkooo567

We are adding more logging and debugging support in the next few release, there are people actively working on this.

+1 for this feature

Nov 11 '19 09:11 Deninc

clipper clipper copied to clipboard

How to debug cause of "ImportError when running Container"

clipper
clipper copied to clipboard