docker-plugin icon indicating copy to clipboard operation
docker-plugin copied to clipboard

WIP/RFC fix windows containers - ensureWaiting/injection of the agent

Open stephan48 opened this issue 3 years ago • 2 comments

This is a draft PR which should not be merged just yet. Hence i ignored the template for now, this should be for discussions/strategy planing first, then I will rework this PR into a proper state.

We run a Jenkins instance with a Docker Cloud configured with windows containers, we discovered that there are two blocking issues preventing windows containers from being started when Hyper-V Virtualization is enabled:

  • ensureWaiting -> unless you define a custom command to keep the container in a permanent running state running the container will fail with an obscure Hyper-V message because /bin/sh is not available
  • Hyper-V does not permit the copying of files into a running container. Currently the injection of the Agent Jar is done after the Containers are started, I implemented a crude detection if we have a Windows container and then do it before we initially start the container

The branch in this PR solves both problems in a "hacky" way.

Detection if it is a windows container is done by checking the workdir for existence of "^.:" indicating a structure similar to C:\bla. This I find not too nice but I could not devise a better scheme yet. This should maybe moved to a more global State for a given Container, so we have a single Place to check this.

The second part is the handling of injection, currently in my PR when I see its a Windows container I inject before start other whise after. The question here is.. The beforeContainerStarted callback explicitly advises that it is a good place for injecting the Jar, why is it done in afterContainerStarted? I did not change this because i assume there must be a reason for this.

I attached TODO markers at all points where i seek clarification.

My tests where done on default unchanged linux(jenkins/agent:latest) and windows(jenkins/agent:jdk11-windowsservercore-ltsc2019) containers.

Thank you for your time.

  • [ ] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • [ ] Ensure that the pull request title represents the desired changelog entry
  • [ ] Please describe what you did
  • [ ] Link to relevant issues in GitHub or Jira
  • [ ] Link to relevant pull requests, esp. upstream and downstream changes
  • [ ] Ensure you have provided tests - that demonstrates feature works or fixes the issue

stephan48 avatar Jan 04 '22 22:01 stephan48

Hmm, yes, that admission from Hyper-V that filesystem operations against a running Hyper-V container are not supported is going to be a problem. I mean, ideally, Microsoft would actually finish implementing Hyper-V and support it but as Microsoft do like to "improve" upon standards and be a bit different, I suspect that hell will freeze over before they publicly admit that it's a deficiency, let alone fix it. So that means that the only realistic option is dodging around the issue instead.

...hmm... that said ... it may be as simple as switching the code from doing the injection in the afterContainerStarted callback to doing it in the beforeContainerStarted method. According to the Javadocs for those methods, the beforeContainerStarted method is the better place to do that injection. It may be complete chance that it worked at all - it may be that you've actually found a (fairly long standing) bug in the plugin, and that it should be beforeContainerStarted for both Windows and Linux containers. (by all means go digging in the history to see what it did before I refactored it - it's quite possible my most recent refactor in this area accidentally changed when things happened)

So, what I'd suggest is that you make a PR (or change this PR) that switches the jar injection to beforeContainerStarted regardless of it being a Linux or Windows container. i.e. no hacky "are we on windows?" checking, just make the change for both Linux and Windows. If that passes the automated tests (as reported by GitHub) then it hasn't broken the Linux container functionality (we actually test that - it's the Windows functionality that isn't tested) and it should be safe to merge. Note: The automated tests are temperamental and prone to weird failures that I've never got to the bottom of. I've found that they work just fine on the VM I use to dev this plugin but they merely "often work" on the official Jenkins CI builds.

...and, if you're interested in having the docker-plugin work on Windows then, ideally, you'd then go on to submit a PR to improve the unit-tests so that they don't skip all the io.jenkins.docker.connector tests when on Windows, so that way we can get automated tests for Windows too (assuming that the Jenkins CI system gives us Windows VMs that have docker capability, that is). Right now, the automated tests just skip anything that needs a docker daemon when running on Windows, and that (plus Microsoft's tendency to be different) is why Windows container support is so poor even though there's really no good reason why this plugin shouldn't be just as able to work on Windows as it does on Linux.

pjdarton avatar May 18 '22 14:05 pjdarton

Thank you very much for the review! I totally agree on your course of action - i will try to look at the unittests too, but can't promise anything.

stephan48 avatar May 18 '22 20:05 stephan48