Windows-Containers icon indicating copy to clipboard operation
Windows-Containers copied to clipboard

SxS dll loading fails in host process containers running on containerd 1.7

Open chris-raitano opened this issue 1 year ago • 15 comments

Describe the bug My team runs a ruby script within a hpc (host process container). This works on containerd 1.6, but after upgrading to containerd 1.7 ruby fails to start with the below error message

Program 'ruby.exe' failed to run: The application has failed to start because its side-by-side configuration is
incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detailAt
C:\hpc\opt\hostlogswindows\scripts\powershell\main.ps1:63 char:5
    +     & $rubypath ./opt/hostlogswindows/scripts/ruby/tomlparser-hostlog ...
    +     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.
At C:\hpc\opt\hostlogswindows\scripts\powershell\main.ps1:63 char:5
    +     & $rubypath ./opt/hostlogswindows/scripts/ruby/tomlparser-hostlog ...
    +     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ResourceUnavailable: (:) [], ApplicationFailedException
    + FullyQualifiedErrorId : NativeCommandFailed

The sxstrace.exe tool provides this additional info, which indicates that the manifest isn't found:

Begin Activation Context Generation.
Input Parameter:
        Flags = 0
        ProcessorArchitecture = AMD64
        CultureFallBacks = en-US;en
        ManifestPath = C:\hpc\ruby31\bin\ruby.exe
        AssemblyDirectory = C:\hpc\ruby31\bin\
        Application Config File =
-----------------
INFO: Parsing Manifest File C:\hpc\ruby31\bin\ruby.exe.
        INFO: Manifest Definition Identity is (null).
        INFO: Reference: ruby_builtin_dlls,type="win32",version="1.0.0.0"
INFO: Resolving reference ruby_builtin_dlls,type="win32",version="1.0.0.0".
        INFO: Resolving reference for ProcessorArchitecture ruby_builtin_dlls,type="win32",version="1.0.0.0".
                INFO: Resolving reference for culture Neutral.
                        INFO: Applying Binding Policy.
                                INFO: No binding policy redirect found.
                        INFO: Begin assembly probing.
                                INFO: Did not find the assembly in WinSxS.
                                INFO: Attempt to probe manifest at C:\hpc\ruby31\bin\ruby_builtin_dlls.DLL.
                                INFO: Attempt to probe manifest at C:\hpc\ruby31\bin\ruby_builtin_dlls.MANIFEST.
                                INFO: Attempt to probe manifest at C:\hpc\ruby31\bin\ruby_builtin_dlls\ruby_builtin_
dlls.DLL.
                                INFO: Attempt to probe manifest at C:\hpc\ruby31\bin\ruby_builtin_dlls\ruby_builtin_
dlls.MANIFEST.
                                INFO: Did not find manifest for culture Neutral.
                        INFO: End assembly probing.
        ERROR: Cannot resolve reference ruby_builtin_dlls,type="win32",version="1.0.0.0".
ERROR: Activation Context generation failed.
End Activation Context Generation.

However, the manifest does exist at the expected location (C:\hpc\ruby31\bin\ruby_builtin_dlls\ruby_builtin_dlls.manifest) with the below content

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
      <assemblyIdentity type="win32" name="ruby_builtin_dlls" version="1.0.0.0"></assemblyIdentity>
 
      <file name="libffi-8.dll"/><file name="libgmp-10.dll"/><file name="libwinpthread-1.dll"/><file name="libyaml-0-2.dll"/> 
      <filename="zlib1.dll"/><file name="libcrypto-1_1-x64.dll"/><file name="libgcc_s_seh-1.dll"/><file name="libssl-1_1-x64.dll"/>
    </assembly>

We are able to run ruby by copying the binaries outside of C:\hpc, but can’t run it inside C:\hpc. (In other words, we can run it inside the container if the container copies the files onto the host filesystem, but we lose the benefits of filesystem isolation)

Since this works both in containerd 1.6 hpc and in containerd 1.7 outside of the c:\hpc directory, my guess is it's likely related to the new bind mount used for the C:\hpc directory.

To Reproduce I've created a lightweight container which we've been able to use to reproduce this

Dockerfile

FROM mcr.microsoft.com/windows/servercore:ltsc2019

# Install chocolatey
ENV chocolateyVersion 1.4.0
RUN powershell -Command "Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))"
# Install ruby
RUN choco install -y ruby --version 3.1.1.1 --params "'/InstallDir:C:\ruby31'"

COPY main.ps1 /main.ps1

ENTRYPOINT ["powershell", "C:\\hpc\\main.ps1"]

main.ps1

& ./ruby31/bin/ruby.exe --version

while($true){
    Start-Sleep 3600
}

And we've deployed it to our kubernetes cluster with this yaml, replacing <IMAGE> with the built container image from our container registry

apiVersion: apps/v1
kind: DaemonSet
metadata:
 name: ruby-test
 labels:
  app: ruby-test
spec:
 selector:
  matchLabels:
    app: ruby-test
 template:
  metadata:
    labels:
      app: ruby-test
  spec:
    securityContext:
      windowsOptions:
        hostProcess: true
        runAsUserName: "NT AUTHORITY\\SYSTEM"
    hostNetwork: true
    containers:
     - name: ruby-test
       image: <IMAGE>
       imagePullPolicy: Always
       workingDir: /hpc
    nodeSelector:
      kubernetes.io/os: windows

Expected behavior Expected behavior is to be able to run executables with side-by-side dlls inside a host process container. When running an exe within a container and a working directory inside the container filesystem (c:\hpc), we'd expect the SxS dll loader to be able to find manifests and load dlls within the container filesystem

Configuration:

  • Edition: Windows Server 2019
  • Base Image being used: Windows Server Core 2019
  • Container engine: containerd
  • Container Engine version 1.7

Additional context

chris-raitano avatar May 23 '23 22:05 chris-raitano