infrastructure icon indicating copy to clipboard operation
infrastructure copied to clipboard

Static Docker containers need to be restarted with `--cpuset-cpus="0-3"`

Open Haroon-Khel opened this issue 1 year ago • 5 comments

ref https://github.com/adoptium/infrastructure/issues/3360#issuecomment-1924438777

All of our containers need to be restarted with the proper command to assign 4 cpus. The command --cpuset-cpus="0-3" needs to be used instead of --cpus=4.0. That way the test jobs can properly read the number of cpus on the container, instead of reading 160 cores on the dockerhost, and then assigning the appropriate concurrency. At the moment test jobs are running with -concurrency:81 on the containers while it should be -concurrency:3

The following nodes have been restarted with --cpuset-cpus="0-3"

dockerhost-equinix-ubuntu2204-armv8l-1

  • https://ci.adoptium.net/computer/test-docker-debain12-armv8l-1/
  • All of https://ci.adoptium.net/label/hw.dockerhost.arm.dockerhost-equinix-ubuntu2204-armv8-1/
  • https://ci.adoptium.net/computer/test-docker-alpine319-armv8-1/

dockerhost-equinix-ubuntu2004-armv8l-1

  • https://ci.adoptium.net/computer/test-docker-sles15-armv8l-1/
  • https://ci.adoptium.net/computer/test-docker-fedora39-armv8l-1/

Haroon-Khel avatar Feb 08 '24 12:02 Haroon-Khel

@Haroon-Khel Is test-docker-sles15-armv8l-1 based on the BCI image referenced in https://github.com/adoptium/infrastructure/issues/3135?

sxa avatar Feb 08 '24 15:02 sxa

Note: This PR should cap test concurrency to either:

  • (0.5*cores)+1 or
  • (0.5*gigs-of-memory)

Whichever is smaller.

Also, we calculate "memory" as either the machine memory of the cgroup (container) memory, whichever is smaller.

adamfarley avatar Feb 14 '24 13:02 adamfarley

✅ Implies the containers have been rerun with --cpuset-cpus="0-3"

[
    {
        "name": "dockerhost-equinix-ubuntu2004-armv8-1",
        "ip": "147.75.35.203",
        "containers": [
            "build-docker-ubuntu2004-armv7l-1",  Does not exist on machine
            "test-docker-alpine313-aarch64-1", replaced by test-docker-alpine319-armv8-2 ✅ 
            "test-docker-alpine314-aarch64-1", replaced by test-docker-alpine319-armv8-4 ✅ 
            "test-docker-fedora39-armv8l-1", ✅ 
            "test-docker-sles15-armv8l-1", ✅ 
            "test-docker-ubuntu1804-armv8l-4", ✅ 
            "test-docker-ubuntu2004-armv7l-1",
            "test-docker-ubuntu2004-armv7l-2",
            "test-docker-ubuntu2004-armv7l-3",
            "test-docker-ubuntu2004-armv8l-1", ✅ 
            "test-docker-ubuntu2004-armv8l-2", ✅ 
            "test-docker-ubuntu2004-armv8l-3", ✅ 
            "test-docker-ubuntu2204-armv8l-2", ✅ 
            "test-docker-ubuntu2310-armv8l-1" ✅ 
        ],
        "containersCount": 14
    },
    {
        "name": "dockerhost-equinix-ubuntu2004-x64-1",
        "ip": "145.40.114.58",
        "containers": [
            "test-docker-alpine314-x64-1",
            "test-docker-alpine317-x64-1",
            "test-docker-centos8-x64-1",
            "test-docker-debian11-x64-1",
            "test-docker-fedora35-x64-1",
            "test-docker-fedora37-x64-1",
            "test-docker-fedora37-x64-3",
            "test-docker-ubi8-x64-1",
            "test-docker-ubuntu2004-x64-1",
            "test-docker-ubuntu2204-x64-1",
            "test-docker-ubuntu2204-x64-3"
        ],
        "containersCount": 11
    },
    {
        "name": "dockerhost-equinix-ubuntu2204-armv8-1",
        "ip": "139.178.86.243",
        "containers": [
            "test-docker-alpine314-armv8-1", replaced by test-docker-alpine319-armv8-3 ✅ 
            "test-docker-alpine314-armv8-3", duplicate of test-docker-alpine314-armv8-1
            "test-docker-alpine315-armv8-2", exists in jenkins but not on dockerhost (ghost)
            "test-docker-alpine319-armv8-1", ✅ 
            "test-docker-debain12-armv8l-1", ✅ 
            "test-docker-ubuntu2004-armv7l-4", ✅ 
            "test-docker-ubuntu2004-armv7l-5", ✅ 
            "test-docker-ubuntu2004-armv7l-6", ✅ 
            "test-docker-ubuntu2204-armv8-1", ✅ 
            "test-docker-ubuntu2204-armv8-2", ✅ 
            "test-docker-ubuntu2204-armv8-3" ✅ 
        ],
        "containersCount": 11
    },
    {
        "name": "dockerhost-equinix-ubuntu2204-x64-1",
        "ip": "145.40.113.173",
        "containers": [
            "test-docker-alpine314-x64-2",
            "test-docker-alpine317-x64-2",
            "test-docker-centos8-x64-2",
            "test-docker-debian11-x64-2",
            "test-docker-fedora35-x64-2",
            "test-docker-fedora37-x64-2",
            "test-docker-ubi8-x64-2",
            "test-docker-ubuntu2004-x64-2",
            "test-docker-ubuntu2204-x64-2"
        ],
        "containersCount": 9
    },
    {
        "name": "dockerhost-marist-ubuntu2204-s390x-1",
        "ip": "148.100.74.237",
        "containers": [
            "test-docker-sles12-s390x-1", ✅ 
            "test-docker-sles15-s390x-1" ✅ 
        ],
        "containersCount": 2
    },
    {
        "name": "dockerhost-osuosl-ubuntu2004-ppc64le-1",
        "ip": "140.211.168.214",
        "containers": [
            "docker-osuosl-ubuntu2004-ppc64le-1", duplicate of dockerhost-osuosl-ubuntu2004-ppc64le-1
            "test-docker-fedora33-ppc64le-1", replaced with test-docker-fedora39-ppc64le-1
            "test-docker-ubuntu1804-ppc64le-1", replaced with test-docker-ubuntu2004-ppc64le-1
            "test-docker-ubuntu2010-ppc64le-1" replaced with test-docker-ubuntu2204-ppc64le-3
        ],
        "containersCount": 4
    },
    {
        "name": "dockerhost-osuosl-ubuntu2204-aarch64-1",
        "ip": "140.211.167.67",
        "containers": [],
        "containersCount": 0
    },
    {
        "name": "dockerhost-rise-ubuntu2204-aarch64-1",
        "ip": "34.72.108.242",
        "containers": [],
        "containersCount": 0
    },
    {
        "name": "dockerhost-skytap-ubuntu2004-ppc64le-1",
        "ip": "20.61.136.212",
        "containers": [
            "test-docker-debian11-ppc64le-1", ✅ 
            "test-docker-debian11-ppc64le-2", ✅ 
            "test-docker-debian11-ppc64le-3", ✅ 
            "test-docker-debian11-ppc64le-4", ✅ 
            "test-docker-ubuntu2204-ppc64le-1", ✅ 
            "test-docker-ubuntu2204-ppc64le-2" ✅ 
        ],
        "containersCount": 6
    },
    {
        "name": "dockerhost-skytap-ubuntu2204-x64-1",
        "ip": "20.61.136.254",
        "containers": [
            "test-docker-debian12-x64-1", ✅ 
            "test-docker-fedora39-x64-1", ✅ 
            "test-docker-ubuntu2204-x64-4", ✅ 
            "test-docker-ubuntu2204-x64-5" ✅ 
        ],
        "containersCount": 4
    }
]

Haroon-Khel avatar Feb 16 '24 12:02 Haroon-Khel

Annoyingly to rerun a container with different parameters, it isnt as simple as docker stop $container docker start $container --new-options. As far as I can tell, I need to stop the running container, remove it, and then run a container from the same image with the new options. I can use https://github.com/adoptium/infrastructure/blob/master/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/dockernode.yml to automate this but I need to update some of the dockerfiles first

Haroon-Khel avatar Feb 16 '24 12:02 Haroon-Khel

I wont restart the x64 equinix nodes as we want to start decommissiong those nodes as anyway as per https://github.com/adoptium/infrastructure/issues/3378#issuecomment-1938595634

Haroon-Khel avatar Feb 16 '24 16:02 Haroon-Khel

With the x64 equinix dockerhost machines decommissioned, theres just the ppc64le nodes on dockerhost-osuosl-ubuntu2004-ppc64le-1 left

Haroon-Khel avatar Apr 11 '24 09:04 Haroon-Khel

dockerhost-osuosl-ubuntu2004-ppc64le-1 nodes have been restarted. Issue is closed

Haroon-Khel avatar Apr 11 '24 14:04 Haroon-Khel