vHive
vHive copied to clipboard
CNN serving image build is broken
Describe the bug CNN serving image is broken due to an update in a dependency.
To Reproduce the error appears in the nightly testing
Expected behavior Successful build
Logs https://github.com/ease-lab/vhive/runs/2436625774?check_suite_focus=true#step:3:133
Workaround
Use pre-built image vhiveease/cnn_serving on DockerHub.
I tried rebuilding this recently - since I made changes in my copy of server.py to help with my use-case. I see errors in Docker build about untrusted signature with these lines:
echo "http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories && \
echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories && \
Eventually this too gives errors:
apk add --allow-untrusted --repository http://dl-3.alpinelinux.org/alpine/edge/testing hdf5 hdf5-dev && \
I tried rebuilding the base image (tatsushid/alpine-py3-tensorflow-jupyter) since that was a suggestion in one of the search results. That fails with this message:
[1mERROR: [0m/tmp/bazel-0.7.0/src/java_tools/junitrunner/java/com/google/testing/coverage/BUILD:29:1: Building src/java_tools/junitrunner/java/com/google/testing/coverage/JacocoCoverage.jar (9 source files) failed (Exit 1): java failed: error executing command
(cd /tmp/bazel_XXMiCOln/out/execroot/io_bazel && \
exec env - \
LC_CTYPE=en_US.UTF-8 \
[0m[91m /usr/lib/jvm/java-1.8-openjdk/bin/java -XX:+TieredCompilation '-XX:TieredStopAtLevel=1' -Xbootclasspath/p:third_party/java/jdk/langtools/javac-9-dev-r4023-3.jar -jar bazel-out/host/bin/src/java_tools/buildjar/java/com/google/devtools/build/buildjar/bootstrap_deploy.jar @bazel-out/local-opt/bin/src/java_tools/junitrunner/java/com/google/testing/coverage/JacocoCoverage.jar-2.params).
[0m[91mjava.lang.InternalError: Cannot find requested resource bundle for locale en_US
There is a warning message that could be related:
.[0m[91m[35mWARNING: [0m/tmp/bazel_XXMiCOln/out/external/bazel_tools/WORKSPACE:1: Workspace name in /tmp/bazel_XXMiCOln/out/external/bazel_tools/WORKSPACE (@io_bazel) does not match the name given in the repository's definition (@bazel_tools); this will cause a build error in future versions.
@adayaru I suggest re-doing this image atop of python-slim instead of Alpine, as it is much easier to get right and maintain in a long term. We don't plan this work as of now so your contribution is more than welcome.
Hi @ustiugov, @adayaru,
I have tried to hack my way to get it working.. The issue is due to rotation of the keys since Alpine 3.15.
I manually downloaded the required packages and have them installed in the docker image. Working copy is here. 006bdda92b9eafa5b77cb97d3897062c26ae4806
However, same problem is present in rnn_serving image as well. it is also not getting built due to the same issue. However, i could not get it to work.
@niravnshah could you please open a PR with the fix?
@ustiugov, I am not sure if we can call this as a fix :) I have explicitly downloaded x86_64 packages, meaning, it will not work for aarch64. Also, I am not sure if having a fixed version of apk is a good idea (they may become stale also, that's why we have those package managers). I guess, this needs to be looked from the base image perspective, and if that needs to/should be changed.