openjdk-runtime
openjdk-runtime copied to clipboard
structure testing docker volumes are convoluted and potentially problematic
I think we might need a bit of a redesign here, couldn't solve it quickly, so I'll document the problems I see.
Currently in our build we have the following participants:
- the docker-enabled host (in case of GCCB it's a GCE instance, in case of container-builder-local it's my workstation) starts the build steps
- mvn step is the first (and only) step of our build currently, that spins up the "step-container"
- that kicks off the maven process, which has an execution phase that runs the
structure_test.sh
script -
structure_test.sh
downloadsext_run.sh
- which was seemingly designed to run from a docker-enabled host directly, not a container - and runs it -
ext_run.sh
spins up a "sibling-container" (as it shares now the daemon with the "step-container") from the built openjdk image
Volumes and directories:
GCCB:
-
host:
/workspace
contains the project directory (passed in) -
step-container:
/workspace
is a volume mounted from the/workspace
directory from host (NOTE: in the future this might change to a separate, isolated volume, that is the clone of the /workspace directory so that the steps don't mutate the host filesystem!!) -
sibling-container:
-
/cfg
- that is mounted with-v /workspace/openjdk8/target/classes/.cfg:/cfg
in the case of openjdk8 -
/workspace
- that is coming from the--workspace
argument ofext_run.sh
:--workspace /workspace/openjdk8/target/test-classes/openjdk-test-common-out/workspace
for example
-
The problematic part is that both sibling-container volumes work currently because of the coincidental design overlap - host \workspace
is directly mounted onto \workspace
in the step-container. As soon as this symmetry breaks we have issues.
The reason is that the "sibling-container" is talking to the docker daemon on the host and as such resolves directory based docker volumes from the host.
To see where this reliance already causes problem, let's have a look at the same directories in the container-builder-local case:
-
host:
/home/balopat/dev/proj/openjdk-runtime
contains the project directory (passed in) -
step-container:
/workspace
is a cloned volume mounted from the/home/balopat/dev/proj/openjdk-runtime
directory (NOTE: this is intentionally designed like this and it might be the future on GCE too!) -
sibling-container:
-
/cfg
- that is mounted with-v /workspace/openjdk8/target/classes/.cfg:/cfg
in the case of openjdk8 ==> ERROR, the directory is empty -
/workspace
- that is coming from the--workspace
argument ofext_run.sh
:--workspace /workspace/openjdk8/target/test-classes/openjdk-test-common-out/workspace
for example ==> ERROR, the directory is empty
-
Structure tests fail with this setup.
The problem:
- our current setup doesn't work directly with container-builder-local
- in the future, if/when GCCB changes the way /workspace is mounted on steps (e.g. an isolated, cloned volume), then we'll fail on GCCB as well.
Not exactly sure about the best way to go. Couple of options:
- we can modify
ext_run.sh
to prepare it for running from within a container and pass in explicitly the right volumes - we can potentially completely rewire the execution flow and rely more on GCCB/GCCB-local for each step instead of wrapping the whole flow into maven - this way we could use the runtimes-common structure tests as they were designed to be used - as a step in GCCB
@aslo @nkubala FYI, let me know what you guys think.
Haha, good timing on this one: I'm in the middle of a major redesign of the structure tests as we speak, which will pull the test execution outside of a container. Check out https://github.com/GoogleCloudPlatform/runtimes-common/pull/332 for progress
awesome @nkubala, let's see how that shakes out, it looks promising! Also it seems that based on our conversation the container-builder-local might have support in the future to define workspace as a directory vs the sandbox volume only - that would help as well in our case (b/65417396)
Great work on this analysis! I @nkubala 's change looks promising.