accelerated-container-image icon indicating copy to clipboard operation
accelerated-container-image copied to clipboard

Overlaybd did not block when networking was not available for long time

Open shuaichang opened this issue 1 year ago • 2 comments

What happened in your environment?

We found a potential overlaybd bug that it returned incorrect data during networking was down. This could lead to application failures, in our case is Java failed to load class

What did you expect to happen?

When networking is down, the class loading should be completely blocked until the network recovers. However, we currently see "Exception: java.lang.NoClassDefFoundError" and " error reading zip file" after retrying for 3+ minutes.

We suspect there's a bug in overlaybd that it returned some unexpected result but instead it should block until networking is recovered. given the following experiments we did:

  1. We did systemctl stop overlaybd-tcmu, after which jar command would actually hang forever until overlaybd-tcmu recover
  2. With a normal jar stored on a device-mapper block device, if we suspend the IO in the DM device, the jar command would hang forever until the IO suspension was removed

How can we reproduce it?

  • Step 1, build, convert and push a repro image using the following Dockerfile
FROM ubuntu:18.04
RUN apt-get update \
    # TODO: upgrade to JAVA 11 in the next sprint
    && apt-get install -y openjdk-8-jdk git vim \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN git clone https://github.com/macagua/example.java.helloworld.git && cd example.java.helloworld && javac HelloWorld/Main.java && jar cfme Main.jar Manifest.txt HelloWorld.Main HelloWorld/Main.class
RUN echo 'export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64' >> ~/.bashrc
  • Step 2: rpull and bash into the container
/opt/overlaybd/snapshotter/ctr -n k8s.io rpull -u $USERNAME:$PASSWORD $IMAGE_REF

ctr -n k8s.io run --snapshotter=overlaybd --rm -t $IMAGE_REF test-jar bash

# In side the shell, run `jar` command to load the binary
  • Step 3: shutdown the network, we did this by turning off the security group of the VM
  • Step 4: inside the bash shell, run
jar vft ./example.java.helloworld/Main.jar


# After several minutes, we see "error reading zip file" error
root@ip-10-0-0-134:/# jar vft ./example.java.helloworld/Main.jar 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar: error reading zip file
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar: error reading zip file
Exception in thread "main" 
Exception: java.lang.NoClassDefFoundError thrown from the UncaughtExceptionHandler in thread "main"

What is the version of your Accelerated Container Image?

  • overlaybd 0.6.10

What is your OS environment?

ubuntu

Are you willing to submit PRs to fix it?

  • [ ] Yes, I am willing to fix it.

shuaichang avatar Jul 17 '23 06:07 shuaichang

Also just to add some more info, per suggested by @liulanzheng offline, the following diff + overlaybd rebuild fixed the issue

image

shuaichang avatar Jul 17 '23 06:07 shuaichang

Verified that 0.6.12 fixed the issue, please feel free to close the issue, thank you very much @liulanzheng for making such a fix!

shuaichang avatar Jul 25 '23 07:07 shuaichang