djl
djl copied to clipboard
NumberFormatException: Cannot parse null string when loading inside of Docker
Description
I am trying to pack pre-trained PyTorch model inside of a Docker container together with app written in Scala. (model loader code) The app works fine when running normally but in Docker i am getting following error. It looks like it can actually find the model but can't read it. Also this container runs fine when i use only DJL-provided models (such as MxNet's vgg16).
If i pass wrong model path - it fails with No model with the specified URI or the matching Input/Output type is found. so it actually sees the model when name is correct.
Expected Behavior
Model is loaded successfully
Error Message
recognizer1 | java.lang.NumberFormatException: Cannot parse null string
recognizer1 | at java.base/java.lang.Integer.parseInt(Integer.java:630)
recognizer1 | at java.base/java.lang.Integer.parseInt(Integer.java:786)
recognizer1 | at ai.djl.mxnet.zoo.nlp.embedding.GloveWordEmbeddingBlockFactory.newBlock(GloveWordEmbeddingBlockFactory.java:42)
recognizer1 | at ai.djl.repository.zoo.BaseModelLoader.createModel(BaseModelLoader.java:202)
recognizer1 | at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:159)
recognizer1 | at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:172)
How to Reproduce?
I can upload the image but it seems pretty useless. The error doesn't happen when running from local machine.
Steps to reproduce
(Paste the commands you ran that produced the error.)
- Pack pre-trained custom model together with loading app into Docker container;
- try to run it.
Environment Info
OS: Fedora 39 CPU: Core i7-7700HQ x86-64 JDK: tried both 11 and 17 Docker: 25.0.3
Here are my DJL dependencies:
// version "0.26.0"
val djl = Seq(
"ai.djl" % "api" % Versions.djl,
// mxnet is used in object detection for embedded vgg16
"ai.djl.mxnet" % "mxnet-model-zoo" % Versions.djl,
"ai.djl.mxnet" % "mxnet-engine" % Versions.djl,
// pytorch for nsfw detection
"ai.djl.pytorch" % "pytorch-engine" % Versions.djl,
"ai.djl.pytorch" % "pytorch-model-zoo" % Versions.djl
)
And Dockerfile:
FROM eclipse-temurin:17.0.6_10-jre-jammy
WORKDIR /opt/app
COPY ./target/scala-2.13/image-hosting-processing-recognizer-assembly-0.1.0-SNAPSHOT.jar ./
COPY synset.txt ./
COPY nsfw_model.pt ./
ENTRYPOINT ["java", "-cp", "image-hosting-processing-recognizer-assembly-0.1.0-SNAPSHOT.jar", "com.github.baklanovsoft.imagehosting.recognizer.Main"]
Oh i guess i found the solution. I've made this changes to Dockerfile:
FROM eclipse-temurin:17-jre-jammy
WORKDIR /opt/app
COPY ./target/scala-2.13/image-hosting-processing-recognizer-assembly-0.1.0-SNAPSHOT.jar ./app.jar
RUN mkdir /opt/app/nsfw
ENTRYPOINT ["java", "-cp", "app.jar", "com.github.baklanovsoft.imagehosting.recognizer.Main"]
And started to mount the model in compose:
volumes:
- recognizer1-djl-cache:/root/.djl.ai
- "./recognizer/synset.txt:/opt/app/nsfw/synset.txt"
- "./recognizer/nsfw_model.pt:/opt/app/nsfw/nsfw_model.pt"
But i guess the main reason is subfolder. I've been putting the model in the same folder as .jar file. I am not sure if it's a bug but feel free to close this ticket if it is expected.