Windows-Containers icon indicating copy to clipboard operation
Windows-Containers copied to clipboard

Process Isolation is very slow as compared to HyperV Containers on Server 2019

Open saraf-akshay opened this issue 1 year ago • 27 comments

Describe the bug Slowness in cloning source when running multiple containers simultaneously in process isolation.

Isolation Mode Time in Git clone Containers running in parallel Comments
Process 9 mins 1
HyperV 8.5 mins 1
Process 21 mins 10 <-- This is the problem
HyperV 11 mins 10

As the number of containers increases on the server, the performance of container slows down significantly but only in process isolation. I am not worried about minor performance differences. The same also happens when I compile in these containers using nmake. The performance degrades in process isolation.

These 10 containers I mentioned above are triggered by a Jenkins pipeline using Kubernetes. Here is the yaml code I used:

apiVersion: v1
kind: Pod
spec:
  tolerations:
  - effect: NoSchedule
    key: custom/build-hosts
    operator: Exists
  containers:
  - name: jnlp
    image: <image link redacted>
    command:
    - powershell
    args:
    - cp -R C:\\privconf\\*  C:\\Users\\ContainerAdministrator;
    - C:\\jenkinsscript\\jenkins.ps1
    resources:
      limits:
        cpu: 12
        memory: 16Gi
      requests:
        cpu: 12
        memory: 16Gi
    env:
    - name: MY_POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: MY_HOST_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    volumeMounts:
    - mountPath: /privconf
      name: credential-volume
    - mountPath: /gitcache
      name: cache-volume
    - mountPath: /jenkinsscript
      name: jenkins-script
  volumes:
  - hostPath:
      path: D:/agentconf
      type: ""
    name: credential-volume
  - hostPath:
      path: D:/agentcache
      type: ""
    name: cache-volume
  - configMap:
      defaultMode: 420
      name: jenkins-script
    name: jenkins-script
  nodeSelector:
    custom/fcds: test_akshay

The HyperV Data was gathered using Docker Swarm, as K8S doesn't support HyperV Isolation.

dockerSwarm {
    label "docker-agent"
    image "<image link redacted>"
    limitsNanoCPUs 12000000000
    limitsMemoryBytes 17179860384
    reservationsNanoCPUs 12000000000
    reservationsMemoryBytes 17179860384
}

The physical host that I ran it on is a bare metal server, with 208 logical cores (104 physical cores) after Hyperthreading enabled.

To Reproduce Please trigger 10 parallel containers on the same host at the exact same time, cloning the exact same repository, and that way you should be able to reproduce the issue.

Expected behavior The expectation is for Process Isolation to work on par or better than HyperV Isolation.

Configuration:

  • Edition: Windows Server 2019
  • Base Image being used: jenkins/inbound-agent:3107.v665000b_51092-7-jdk11-windowsservercore-ltsc2019
  • Container engine: Docker
  • Container Engine version:
Client:
Version:           25.0.0
API version:       1.44
Go version:        go1.21.6
Git commit:        e758fe5
Built:             Thu Jan 18 17:10:49 2024
OS/Arch:           windows/amd64
Context:           default

Server: Docker Engine - Community
Engine:
 Version:          25.0.0
 API version:      1.44 (minimum version 1.24)
 Go version:       go1.21.6
 Git commit:       615dfdf
 Built:            Thu Jan 18 17:09:34 2024
 OS/Arch:          windows/amd64
 Experimental:     false

Additional context

I have verified that there is no resource over provisioning and my Windows defender is disabled, and all my processes (including git and git-lfs) and directories where source code is checked out are part of exclusion list. As mentioned here: https://github.com/microsoft/Windows-Containers/issues/149 Also verified I have the Defender fix, which was released here: https://github.com/microsoft/Windows-Containers/issues/345

saraf-akshay avatar Jan 25 '24 21:01 saraf-akshay

Hey @saraf-akshay, could you share what you're seeing with Windows Server 2022 process isolation?

We don't ship OS level fixes anymore for Windows Server 2019 because it is now out of mainstream support (only address security fixes): https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2019

fady-azmy-msft avatar Jan 29 '24 16:01 fady-azmy-msft

@fady-azmy-msft : Thanks for your response. I'm working on preparing a server with Server 2022. It might take a couple days. I'll keep you posted.

saraf-akshay avatar Jan 29 '24 22:01 saraf-akshay

@fady-azmy-msft ,@ntrappe-msft : There is still slowness.

Server 2022 is a lot better than Server 2019. Server 2019 was 2x slower, whereas Server 2022 is 1.25x slower in Process Isolation as compared to HyperV Isolation when I run 10 containers in parallel on a host, (essentially trying to run host at its full capacity) with resource (CPU and Memory) restriction as showed in my first comment's yaml file.

saraf-akshay avatar Feb 06 '24 23:02 saraf-akshay

Here is what I have experienced with process isolation compared to Hyper-V isolation. I have seen cascading container failures and even containers that crash and cannot recover EVER they have to be redeployed. The performance is night and day better on my SHIR containers now with Hyper-V isolation.

Host Running 2019 DC Container 2019 core latest

https://github.com/Azure/Azure-Data-Factory-Integration-Runtime-in-Windows-Container/issues/7

nickcva avatar Feb 14 '24 00:02 nickcva

Hello @Howard-Haiyang-Hao @fady-azmy-msft @ntrappe-msft
Just checking in, Any update on this?

saraf-akshay avatar Mar 27 '24 21:03 saraf-akshay

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

Im now running 80+ SHIR containers with hyper-v isolation successfully with little to no issues. Without hyper isolation the max that I could run was about 25+- and that also created issues that cause the container to completely corrupt its self at random. Please make a Linux compatible SHIR application for ADF / Synapse!

nickcva avatar Jun 04 '24 16:06 nickcva

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

can you run this without using host paths?

doctorpangloss avatar Jul 11 '24 15:07 doctorpangloss

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.