openjdk-runtime icon indicating copy to clipboard operation
openjdk-runtime copied to clipboard

structure testing docker volumes are convoluted and potentially problematic

Open balopat opened this issue 7 years ago • 4 comments

I think we might need a bit of a redesign here, couldn't solve it quickly, so I'll document the problems I see.

Currently in our build we have the following participants:

  • the docker-enabled host (in case of GCCB it's a GCE instance, in case of container-builder-local it's my workstation) starts the build steps
  • mvn step is the first (and only) step of our build currently, that spins up the "step-container"
  • that kicks off the maven process, which has an execution phase that runs the structure_test.sh script
  • structure_test.sh downloads ext_run.sh - which was seemingly designed to run from a docker-enabled host directly, not a container - and runs it
  • ext_run.sh spins up a "sibling-container" (as it shares now the daemon with the "step-container") from the built openjdk image

Volumes and directories:

GCCB:

  • host: /workspace contains the project directory (passed in)
  • step-container: /workspace is a volume mounted from the /workspace directory from host (NOTE: in the future this might change to a separate, isolated volume, that is the clone of the /workspace directory so that the steps don't mutate the host filesystem!!)
  • sibling-container:
    • /cfg - that is mounted with -v /workspace/openjdk8/target/classes/.cfg:/cfg in the case of openjdk8
    • /workspace - that is coming from the --workspace argument of ext_run.sh: --workspace /workspace/openjdk8/target/test-classes/openjdk-test-common-out/workspace for example

The problematic part is that both sibling-container volumes work currently because of the coincidental design overlap - host \workspace is directly mounted onto \workspace in the step-container. As soon as this symmetry breaks we have issues. The reason is that the "sibling-container" is talking to the docker daemon on the host and as such resolves directory based docker volumes from the host.

To see where this reliance already causes problem, let's have a look at the same directories in the container-builder-local case:

  • host: /home/balopat/dev/proj/openjdk-runtime contains the project directory (passed in)
  • step-container: /workspace is a cloned volume mounted from the /home/balopat/dev/proj/openjdk-runtime directory (NOTE: this is intentionally designed like this and it might be the future on GCE too!)
  • sibling-container:
    • /cfg - that is mounted with -v /workspace/openjdk8/target/classes/.cfg:/cfg in the case of openjdk8 ==> ERROR, the directory is empty
    • /workspace - that is coming from the --workspace argument of ext_run.sh: --workspace /workspace/openjdk8/target/test-classes/openjdk-test-common-out/workspace for example ==> ERROR, the directory is empty

Structure tests fail with this setup.

The problem:

  • our current setup doesn't work directly with container-builder-local
  • in the future, if/when GCCB changes the way /workspace is mounted on steps (e.g. an isolated, cloned volume), then we'll fail on GCCB as well.

Not exactly sure about the best way to go. Couple of options:

  • we can modify ext_run.sh to prepare it for running from within a container and pass in explicitly the right volumes
  • we can potentially completely rewire the execution flow and rely more on GCCB/GCCB-local for each step instead of wrapping the whole flow into maven - this way we could use the runtimes-common structure tests as they were designed to be used - as a step in GCCB

balopat avatar Sep 06 '17 21:09 balopat

@aslo @nkubala FYI, let me know what you guys think.

balopat avatar Sep 06 '17 21:09 balopat

Haha, good timing on this one: I'm in the middle of a major redesign of the structure tests as we speak, which will pull the test execution outside of a container. Check out https://github.com/GoogleCloudPlatform/runtimes-common/pull/332 for progress

nkubala avatar Sep 06 '17 21:09 nkubala

awesome @nkubala, let's see how that shakes out, it looks promising! Also it seems that based on our conversation the container-builder-local might have support in the future to define workspace as a directory vs the sandbox volume only - that would help as well in our case (b/65417396)

balopat avatar Sep 07 '17 13:09 balopat

Great work on this analysis! I @nkubala 's change looks promising.

aslo avatar Sep 07 '17 13:09 aslo