cloud-provider-gcp
cloud-provider-gcp copied to clipboard
Migrate e2e tests for GPUs
-
Migrates 3 GPU upgrade/downgrade e2e tests from k/k in-tree to this
cloud-provider-gcp
repository. Migrated from file: https://github.com/kubernetes/kubernetes/blob/release-1.30/test/e2e/cloud/gcp/node/gpu.go -
Migrates 2 Nvidia GPU e2e tests from k/k in-tree to this
cloud-provider-gcp
repository. Migrated from file: https://github.com/kubernetes/kubernetes/blob/release-1.30/test/e2e/scheduling/nvidia-gpus.go -
Migrates 1 StackDriver instrumentation e2e tests from k/k in-tree to this
cloud-provider-gcp
repository. Migrated from file: https://github.com/kubernetes/kubernetes/blob/release-1.30/test/e2e/instrumentation/monitoring/accelerator.go
Status
Currently failing with pods from Nvidia driver daemonset timing out
Summarizing 6 Failures:
[FAIL] [cloud-provider-gcp-e2e] Stackdriver Monitoring [It] should have accelerator metrics
/home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/nvidia-gpu.go:206
[FAIL] [cloud-provider-gcp-e2e] GPUDevicePluginAcrossRecreate [It] run Nvidia GPU Device Plugin tests with a recreation
/home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/nvidia-gpu.go:206
[FAIL] [cloud-provider-gcp-e2e] Device Plugin GPUs [It] run Nvidia GPU Device Plugin tests
/home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/nvidia-gpu.go:206
[FAIL] [cloud-provider-gcp-e2e] GPU Upgrade cluster upgrade [It] should be able to run gpu pod after upgrade
/home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/nvidia-gpu.go:206
[FAIL] [cloud-provider-gcp-e2e] GPU Upgrade master upgrade [It] should NOT disrupt gpu pod
/home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/nvidia-gpu.go:206
[FAIL] [cloud-provider-gcp-e2e] GPU Upgrade cluster downgrade [It] should be able to run gpu pod after downgrade
/home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/nvidia-gpu.go:206
Ran 9 of 290 Specs in 192.499 seconds
FAIL! -- 3 Passed | 6 Failed | 0 Pending | 281 Skipped
[FAILED] failed to get pods controlled by the nvidia-driver-installer daemonset: Timed out after 60.000s.
expected at least 1 pods, only got 0
I0529 17:55:52.560354 1857376 upgrade_context.go:86] Version for "ci/latest" is "v1.31.0-alpha.0.983+f44bb5e6e58c31\n"
I0529 17:55:52.677652 1857376 nvidia-gpu.go:144] Nodename: kt2-1717029961633-master, OS Image: Container-Optimized OS from Google
I0529 17:55:52.677662 1857376 nvidia-gpu.go:144] Nodename: kt2-1717029961633-minion-group-2hm0, OS Image: Container-Optimized OS from Google
I0529 17:55:52.677664 1857376 nvidia-gpu.go:144] Nodename: kt2-1717029961633-minion-group-br6z, OS Image: Container-Optimized OS from Google
I0529 17:55:52.677666 1857376 nvidia-gpu.go:144] Nodename: kt2-1717029961633-minion-group-d2g3, OS Image: Container-Optimized OS from Google
I0529 17:55:52.677669 1857376 nvidia-gpu.go:101] Using default local nvidia-driver-installer daemonset manifest.
I0529 17:55:52.743577 1857376 nvidia-gpu.go:112] Successfully created daemonset to install Nvidia drivers.
I0529 17:56:52.803452 1857376 nvidia-gpu.go:115] Failed inside E2E framework:
k8s.io/kubernetes/test/e2e/framework/pod.WaitForPods({0x7b1611463800, 0xc0005fc2d0}, {0x43623f0, 0xc0013008c0}, {0xc000aac7b0, 0x2f}, {{{0x0, 0x0}, {0x0, 0x0}}, ...}, ...)
/home/sean/go/pkg/mod/k8s.io/[email protected]/test/e2e/framework/pod/wait.go:327 +0x625
k8s.io/kubernetes/test/e2e/framework/pod.WaitForPodsWithLabel({0x7b1611463800, 0xc0005fc2d0}, {0x43623f0, 0xc0013008c0}, {0xc000aac7b0, 0x2f}, {0x433ace0?, 0xc000aea8a0?})
/home/sean/go/pkg/mod/k8s.io/[email protected]/test/e2e/framework/pod/wait.go:657 +0x119
k8s.io/kubernetes/test/e2e/framework/resource.WaitForControlledPods({0x7b1611463800, 0xc0005fc2d0}, {0x43623f0, 0xc0013008c0}, {0xc000aac7b0, 0x2f}, {0xc000161e60?, 0xc0008e13c0?}, {{0x3db2e53, 0xa}, ...})
/home/sean/go/pkg/mod/k8s.io/[email protected]/test/e2e/framework/resource/resources.go:249 +0xd8
k8s.io/cloud-provider-gcp/tests/e2e.SetupNVIDIAGPUNode({0x7b1611463800, 0xc0005fc2d0}, 0xc0010a0000, 0x0)
/home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/nvidia-gpu.go:114 +0x4cc
k8s.io/cloud-provider-gcp/tests/e2e.(*NvidiaGPUUpgradeTest).Setup(0xc000fd7728?, {0x7b1611463800, 0xc0005fc2d0}, 0xc0010a0000)
/home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/nvidia-gpu.go:61 +0x2d
k8s.io/cloud-provider-gcp/tests/e2e.(*chaosMonkeyAdapter).Test(0xc001069480, {0x7b1611463800, 0xc0005fc2d0}, 0xc00095c1e0)
/home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/gpu.go:183 +0x1ce
k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do.func1()
/home/sean/go/pkg/mod/k8s.io/[email protected]/test/e2e/chaosmonkey/chaosmonkey.go:95 +0x6c
created by k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do in goroutine 125
/home/sean/go/pkg/mod/k8s.io/[email protected]/test/e2e/chaosmonkey/chaosmonkey.go:92 +0xa5
I0529 17:56:52.803522 1857376 util.go:650] Running ../../cluster/gce/upgrade.sh [-M v1.31.0-alpha.0.983+f44bb5e6e58c31]
I0529 17:57:05.616098 1857376 upgrade_mechanics.go:40] Unexpected error:
<*errors.errorString | 0xc0006c7860>:
error running ../../cluster/gce/upgrade.sh [-M v1.31.0-alpha.0.983+f44bb5e6e58c31]; got error exit status 1, stdout "Fetching the previously installed CoreDNS version\nThe default etcd storage media type in 1.6 has changed from application/json to application/vnd.kubernetes.protobuf.\nDocumentation about the change can be found at https://kubernetes.io/docs/admin/etcd_upgrade.\n\nETCD2 DOES NOT SUPPORT PROTOBUF: If you wish to have to ability to downgrade to etcd2 later application/json must be used.\n\nIt's HIGHLY recommended that etcd be backed up before this step!!\n\nTo enable using json, before running this script set:\nexport STORAGE_MEDIA_TYPE=application/json\n\nTo enable using protobuf, before running this script set:\nexport STORAGE_MEDIA_TYPE=application/vnd.kubernetes.protobuf\n\n", stderr "Using image: cos-109-17800-218-37 from project: cos-cloud as master image\nUsing image: cos-109-17800-218-37 from project: cos-cloud as node image\nUsing image: cos-109-17800-218-37 from project: cos-cloud as master image\nUsing image: cos-109-17800-218-37 from project: cos-cloud as node image\nUsing image: cos-109-17800-218-37 from project: cos-cloud as master image\nUsing image: cos-109-17800-218-37 from project: cos-cloud as node image\nSTORAGE_MEDIA_TYPE must be specified when run non-interactively.\n"
{
s: "error running ../../cluster/gce/upgrade.sh [-M v1.31.0-alpha.0.983+f44bb5e6e58c31]; got error exit status 1, stdout \"Fetching the previously installed CoreDNS version\\nThe default etcd storage media type in 1.6 has changed from application/json to application/vnd.kubernetes.protobuf.\\nDocumentation about the change can be found at https://kubernetes.io/docs/admin/etcd_upgrade.\\n\\nETCD2 DOES NOT SUPPORT PROTOBUF: If you wish to have to ability to downgrade to etcd2 later application/json must be used.\\n\\nIt's HIGHLY recommended that etcd be backed up before this step!!\\n\\nTo enable using json, before running this script set:\\nexport STORAGE_MEDIA_TYPE=application/json\\n\\nTo enable using protobuf, before running this script set:\\nexport STORAGE_MEDIA_TYPE=application/vnd.kubernetes.protobuf\\n\\n\", stderr \"Using image: cos-109-17800-218-37 from project: cos-cloud as master image\\nUsing image: cos-109-17800-218-37 from project: cos-cloud as node image\\nUsing image: cos-109-17800-218-37 from project: cos-cloud as master image\\nUsing image: cos-109-17800-218-37 from project: cos-cloud as node image\\nUsing image: cos-109-17800-218-37 from project: cos-cloud as master image\\nUsing image: cos-109-17800-218-37 from project: cos-cloud as node image\\nSTORAGE_MEDIA_TYPE must be specified when run non-interactively.\\n\"",
}
[FAILED] in [It] - /home/sean/go/src/k8s.io/cloud-provider-gcp/test/e2e/nvidia-gpu.go:115 @ 05/29/24 17:57:05.617
STEP: Destroying namespace "nvidia-gpu-upgrade-sig-node-sig-scheduling-5395" for this suite. @ 05/29/24 17:57:05.618
STEP: Destroying namespace "gpu-upgrade-5929" for this suite. @ 05/29/24 17:57:05.687