Add Windows Hostprocess flag
Fixes: #518
Still needs:
- [x] docs
- [x] a bit more thought on how to specific user
- [x] test
The test failed with
Command: C:\Windows\system32\config\systemprofile\go\bin\nerdctl.exe --namespace=nerdctl-test run --rm --host-process -u "NT AUTHORITY\Local Service" gcr.io/k8s-staging-e2e-test-images/busybox:1.29-2 whoami
ExitCode: 1
Error: exit status 1
Stdout:
Stderr: time="2022-02-10T18:18:12Z" level=fatal msg="failed to create user process token: failed to logon user: The user name or password is incorrect.: unknown"
Failures:
I am guessing the containerd process isn't running with high enough privileges in the CI environment as this passes locally with containerd installed with a similar script. @dcantah do you know if there is another built in user we could use here?
needs rebase
Added a logging statement for the who the hostprocess container to see what it thinks it is running as and it is coming back as
run_windows_test.go:35: whoami nt authority\system
@dcantah I would think that since it is system it should be able to start Local Service?
Added a logging statement for the who the hostprocess container to see what it thinks it is running as and it is coming back as
run_windows_test.go:35: whoami nt authority\system@dcantah I would think that since it is system it should be able to start
Local Service?
Yea this is really odd.. The only thing I can think of is there's something blocking logging in users in this environment. The cri-integration tests in Containerd have tests that login as different service accounts https://github.com/containerd/containerd/blob/5162238c7da04d6bdd771d0aa43f8e34c91ea2ca/integration/windows_hostprocess_test.go#L38
What's current status?
I refreshed the PR. It is still failing when running in CI though it works locally, I thought it might be a Cirrus vs github actions thing but doesn't look like it.
I will need to dig deeper, as it is working locally when I test it.
debugging a different problem and got some insight from @marosset, It might be because I am not passing the user name info through on the config. going to try that and see I can get this working. better late than never 🙈
debugging a different problem and got some insight from @marosset, It might be because I am not passing the user name info through on the config. going to try that and see I can get this working. better late than never 🙈
I think that will fix the issue
Nice. We didn't allow a blank username for CRI either, right? That's probably the right way.. defaulting to SYSTEM is kinda scary
Hmm.. if the username string is empty we should error a bit earlier https://github.com/microsoft/hcsshim/blob/6547959343d65e0cf431dd9d589f45ee2b98172c/internal/jobcontainers/logon.go#L109. Maybe it's filling in container-user/container-admin from the image? 🤷♂️
That wasn't the case, It is being passed by the main nerdctl options.
I compared the oci spec to one I have working via the cri server and nothing jumps out. Here is the oci spec that is being generated with nerdctl:
{
"ociVersion": "1.0.2-dev",
"process": {
"user": {
"uid": 0,
"gid": 0,
"username": "nt authority\\local system"
},
"args": [
"cmd.exe",
"/s",
"/c",
"cmd"
],
"env": [
"PATH=C:\\dig\\;C:\\bin;C:\\curl;C:\\Windows\\System32;C:\\Windows;C:\\Program Files\\PowerShell;",
"ProgramFiles=C:\\Program Files",
"LOCALAPPDATA=C:\\Users\\ContainerAdministrator\\AppData\\Local",
"PSModuleAnalysisCachePath=C:\\Users\\ContainerAdministrator\\AppData\\Local\\Microsoft\\Windows\\PowerShell\\docker\\ModuleAnalysisCache",
"PSCORE=C:\\Program Files\\PowerShell\\pwsh.exe"
],
"cwd": ""
},
"root": {
"path": ""
},
"hostname": "1624c258dfdc",
"hooks": {
"createRuntime": [
.... cut this out, hooks are supported in windows so shouldn't have any affect
]
},
"annotations": {
"io.containerd.image.config.stop-signal": "SIGTERM",
"microsoft.com/hostprocess-container": "true",
"nerdctl/extraHosts": "null",
"nerdctl/hostname": "1624c258dfdc",
"nerdctl/name": "busybox-1624c",
"nerdctl/namespace": "default",
"nerdctl/networks": "[\"nat\"]",
"nerdctl/platform": "windows/amd64",
"nerdctl/state-dir": "C:\\ProgramData\\nerdctl\\052055e3\\containers\\default\\1624c258dfdc478e27fdd3f4cb6a0874486f7e1abd2cef426aaeec2c97b908c2"
},
"windows": {
"layerFolders": null,
"ignoreFlushesDuringBoot": true,
"network": {
"allowUnqualifiedDNSQuery": true
}
}
}
and the spec generated with the cri call:
{
"ociVersion": "1.0.2-dev",
"process": {
"user": {
"uid": 0,
"gid": 0,
"username": "NT AUTHORITY\\SYSTEM"
},
"args": [
"ping",
"-t",
"localhost"
],
"env": [
"PATH=C:\\Windows\\system32;C:\\Windows;"
],
"cwd": "C:\\"
},
"annotations": {
"io.kubernetes.cri.container-name": "busybox",
"io.kubernetes.cri.container-type": "container",
"io.kubernetes.cri.image-name": "k8s.gcr.io/pause@sha256:85cfebc79dccc7a0e56680778a614b29a0b1c2ae98d4b1efc746764c04d5656c",
"io.kubernetes.cri.sandbox-id": "0164ef1c9d9a72e904fe45f12e5698e06338ec913798798f2030773ffa0b1b09",
"io.kubernetes.cri.sandbox-name": "sandbox-hp",
"io.kubernetes.cri.sandbox-namespace": "default",
"io.kubernetes.cri.sandbox-uid": "alskasdfasdf",
"microsoft.com/hostprocess-container": "true"
},
"windows": {
"layerFolders": null,
"network": {}
}
}
Can we make a dummy test that prints out the current user that the CI is running as? It feels like a permission issue
Already did https://github.com/containerd/nerdctl/pull/795/commits/f3af71a5937eb9cadf36799538fc9a9318955bb7 but thought it doesn't seem to have output what I thought it should. Let me try something else.
configuration complete! Printing configuration... Service:
Status : Running Name : containerd DisplayName : containerd
Id : 6212 Handles : 148 CPU : 0.140625 SI : 0 Name : containerd
btw, the test that uses hostprocess containers without the user specified passes in CI
I am also able to reproduce this locally where my process is running as a service under nt authority\system
I added a bunch of tests with different inputs, some of them passed. It appears to be the way the paramaters are passed to the binary in the test framework. I will clean it up and I think we will have a few tests passing 🤞
The windows tests are passing. Ends up the NT AUTHORITY must be upper case and I had some extra quotes when they were not needed in the test framework. I will clean up the commits and we should be good for a review.
@AkihiroSuda this is finally ready for review. I did move the Windows CI from Cirrus to Github actions during my debugging, I can either drop that commit or move it to a separate PR but I think I prefer GitHub UI for debugging and it aligns better with rest of the tests.
If Cirrus supports nested virt it might be worth staying on it, otherwise we won't be able to test hyper-v isolation on GHA (unless they made progress on offering this..)
I had a couple of suggestions for the usage but overall LGTM Thanks @jsturtevant
If Cirrus supports nested virt it might be worth staying on it, otherwise we won't be able to test hyper-v isolation on GHA (unless they made progress on offering this..)
great point, It does look like the compute_engine_instance (https://cirrus-ci.org/guide/custom-vms/) does support nested_virtualization: nested_virtualization: true # optional. Whether to enable Intel VT-x. Defaults to false.
I will drop this commit.
It does look like the compute_engine_instance (https://cirrus-ci.org/guide/custom-vms/) does support nested_virtualization: nested_virtualization: true # optional. Whether to enable Intel VT-x. Defaults to false.
Sweet!