redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Failure in k8s-operator endpoint-template (PostStart hook)

Open jcsp opened this issue 2 years ago • 9 comments

https://buildkite.com/redpanda/redpanda/builds/18840#01849373-8425-495a-95e6-7c462a832601

--- FAIL: kuttl (2273.54s)
    --- FAIL: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/resources-redpanda (9.51s)
        --- PASS: kuttl/harness/additional-configuration (218.28s)
        --- PASS: kuttl/harness/redpanda-schema-registry-sasl (401.49s)
        --- PASS: kuttl/harness/produce-tls (12.22s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume-tls-client (10.17s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume-sasl (46.31s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume (79.35s)
        --- PASS: kuttl/harness/node-select-tolerations (6.38s)
        --- PASS: kuttl/harness/external-connectivity (9.23s)
        --- PASS: kuttl/harness/explicit-ports-tls (10.15s)
        --- PASS: kuttl/harness/redpanda-schema-registry (649.27s)
        --- PASS: kuttl/harness/create-topic-with-client-auth (11.27s)
        --- FAIL: kuttl/harness/endpoint-template (300.70s)
        --- PASS: kuttl/harness/create-topic-given-issuer (47.35s)
        --- PASS: kuttl/harness/create-topic-given-issuer-with-client-auth (55.63s)
        --- PASS: kuttl/harness/create-topic-given-cert-secret (52.29s)
        --- PASS: kuttl/harness/create-topic-given-cm-secret-client-auth (56.34s)
        --- PASS: kuttl/harness/console-admin-api (164.97s)
        --- PASS: kuttl/harness/console-kafka-mtls (172.62s)
        --- PASS: kuttl/harness/admin-api-tls-client-auth (11.22s)
        --- PASS: kuttl/harness/admin-api-tls (12.21s)
        --- PASS: kuttl/harness/admin-api (10.09s)
        --- PASS: kuttl/harness/centralized-configuration-bootstrap (43.42s)
        --- PASS: kuttl/harness/confluent-schema-registry (60.37s)
        --- PASS: kuttl/harness/console (164.46s)
        --- PASS: kuttl/harness/centralized-configuration-upgrade (117.20s)
        --- PASS: kuttl/harness/centralized-configuration-tls (218.49s)
        --- PASS: kuttl/harness/update-image-and-node-port (294.65s)
        --- PASS: kuttl/harness/update-image-tls-client-auth (291.80s)
        --- PASS: kuttl/harness/update-image-tls (238.90s)
        --- PASS: kuttl/harness/update (85.01s)
        --- PASS: kuttl/harness/centralized-configuration-drift (96.50s)
        --- PASS: kuttl/harness/superusers-prefix (194.81s)
FAIL

jcsp avatar Nov 21 '22 12:11 jcsp

This is miss leading. The amount of logs generated by kuttl from unstable k8s test hides the true problem in the main test suite.

Taken from raw output.

--- FAIL: kuttl (2273.54s)
    --- FAIL: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/resources-redpanda (9.51s)
        --- PASS: kuttl/harness/additional-configuration (218.28s)
        --- PASS: kuttl/harness/redpanda-schema-registry-sasl (401.49s)
        --- PASS: kuttl/harness/produce-tls (12.22s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume-tls-client (10.17s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume-sasl (46.31s)
        --- PASS: kuttl/harness/pandaproxy-produce-consume (79.35s)
        --- PASS: kuttl/harness/node-select-tolerations (6.38s)
        --- PASS: kuttl/harness/external-connectivity (9.23s)
        --- PASS: kuttl/harness/explicit-ports-tls (10.15s)
        --- PASS: kuttl/harness/redpanda-schema-registry (649.27s)
        --- PASS: kuttl/harness/create-topic-with-client-auth (11.27s)
        --- FAIL: kuttl/harness/endpoint-template (300.70s)
        --- PASS: kuttl/harness/create-topic-given-issuer (47.35s)
        --- PASS: kuttl/harness/create-topic-given-issuer-with-client-auth (55.63s)
        --- PASS: kuttl/harness/create-topic-given-cert-secret (52.29s)
        --- PASS: kuttl/harness/create-topic-given-cm-secret-client-auth (56.34s)
        --- PASS: kuttl/harness/console-admin-api (164.97s)
        --- PASS: kuttl/harness/console-kafka-mtls (172.62s)
        --- PASS: kuttl/harness/admin-api-tls-client-auth (11.22s)
        --- PASS: kuttl/harness/admin-api-tls (12.21s)
        --- PASS: kuttl/harness/admin-api (10.09s)
        --- PASS: kuttl/harness/centralized-configuration-bootstrap (43.42s)
        --- PASS: kuttl/harness/confluent-schema-registry (60.37s)
        --- PASS: kuttl/harness/console (164.46s)
        --- PASS: kuttl/harness/centralized-configuration-upgrade (117.20s)
        --- PASS: kuttl/harness/centralized-configuration-tls (218.49s)
        --- PASS: kuttl/harness/update-image-and-node-port (294.65s)
        --- PASS: kuttl/harness/update-image-tls-client-auth (291.80s)
        --- PASS: kuttl/harness/update-image-tls (238.90s)
        --- PASS: kuttl/harness/update (85.01s)
        --- PASS: kuttl/harness/centralized-configuration-drift (96.50s)
        --- PASS: kuttl/harness/superusers-prefix (194.81s)
FAIL

Kuttl returned exit code 1. Creating artifacts tarball

RafalKorepta avatar Nov 21 '22 13:11 RafalKorepta

Events from kuttle output doesn't show the root cause of pod-0 not being scheduled

    logger.go:42: 06:03:12 | endpoint-template | endpoint-template events from ns kuttl-test-select-louse:
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-admin-client		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-kafka-root-certificate-7wd6c		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-kafka-root-certificate-7wd6c		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-kafka-root-certificate		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-kafka-root-certificate		Generated	Stored new private key in temporary Secret resource "endpoint-template-kafka-root-certificate-ldcxk"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-kafka-root-certificate		Requested	Created new CertificateRequest resource "endpoint-template-kafka-root-certificate-7wd6c"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-kafka-root-certificate		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Issuer.cert-manager.io endpoint-template-kafka-root-issuer		KeyPairVerified	Signing CA verified		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-operator-client		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-node		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-root-certificate		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Warning	Issuer.cert-manager.io endpoint-template-proxy-root-issuer		ErrGetKeyPair	Error getting keypair for CA issuer: secret "endpoint-template-proxy-root-certificate" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Warning	Issuer.cert-manager.io endpoint-template-proxy-root-issuer		ErrInitIssuer	Error initializing issuer: secret "endpoint-template-proxy-root-certificate" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-redpanda		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:11 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-user-client		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-0		WaitForFirstConsumer	waiting for first consumer to be created before binding		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-0		ExternalProvisioning	waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-0		Provisioning	External provisioner is provisioning volume for claim "kuttl-test-select-louse/datadir-endpoint-template-0"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-1		WaitForFirstConsumer	waiting for first consumer to be created before binding		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-1		ExternalProvisioning	waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-1		Provisioning	External provisioner is provisioning volume for claim "kuttl-test-select-louse/datadir-endpoint-template-1"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-2		WaitForFirstConsumer	waiting for first consumer to be created before binding		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-2		ExternalProvisioning	waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-2		Provisioning	External provisioner is provisioning volume for claim "kuttl-test-select-louse/datadir-endpoint-template-2"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-admin-client-vrzps		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-admin-client-vrzps		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-admin-client		Generated	Stored new private key in temporary Secret resource "endpoint-template-admin-client-pzjjd"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-admin-client		Requested	Created new CertificateRequest resource "endpoint-template-admin-client-vrzps"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-admin-client		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-operator-client-gtb6j		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-operator-client-gtb6j		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-operator-client		Generated	Stored new private key in temporary Secret resource "endpoint-template-operator-client-jt6hs"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-operator-client		Requested	Created new CertificateRequest resource "endpoint-template-operator-client-gtb6j"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-operator-client		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-client		Issuing	Issuing certificate as Secret does not exist		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-client		Generated	Stored new private key in temporary Secret resource "endpoint-template-proxy-api-client-xlc78"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-node		Generated	Stored new private key in temporary Secret resource "endpoint-template-proxy-api-node-l5whb"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-root-certificate-6r8wx		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-root-certificate-6r8wx		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-root-certificate		Generated	Stored new private key in temporary Secret resource "endpoint-template-proxy-root-certificate-dbbm2"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-root-certificate		Requested	Created new CertificateRequest resource "endpoint-template-proxy-root-certificate-6r8wx"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-root-certificate		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-redpanda-fwzf7		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-redpanda-fwzf7		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-redpanda		Generated	Stored new private key in temporary Secret resource "endpoint-template-redpanda-5pntk"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-redpanda		Requested	Created new CertificateRequest resource "endpoint-template-redpanda-fwzf7"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-redpanda		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-user-client		Generated	Stored new private key in temporary Secret resource "endpoint-template-user-client-hxcdh"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	PodDisruptionBudget.policy endpoint-template		NoPods	No matching pods found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Claim datadir-endpoint-template-0 Pod endpoint-template-0 in StatefulSet endpoint-template success		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Pod endpoint-template-0 in StatefulSet endpoint-template successful		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Claim datadir-endpoint-template-1 Pod endpoint-template-1 in StatefulSet endpoint-template success		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Pod endpoint-template-1 in StatefulSet endpoint-template successful		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Claim datadir-endpoint-template-2 Pod endpoint-template-2 in StatefulSet endpoint-template success		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:12 +0000 UTC	Normal	StatefulSet.apps endpoint-template		SuccessfulCreate	create Pod endpoint-template-2 in StatefulSet endpoint-template successful		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-client-c6tvl		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-client-c6tvl		IssuerNotReady	Referenced issuer does not have a Ready status condition		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-client		Requested	Created new CertificateRequest resource "endpoint-template-proxy-api-client-c6tvl"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-node-nkxz5		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-node-nkxz5		IssuerNotReady	Referenced issuer does not have a Ready status condition		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-node		Requested	Created new CertificateRequest resource "endpoint-template-proxy-api-node-nkxz5"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-user-client-nkxj8		cert-manager.io	Certificate request has been approved by cert-manager.io		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-user-client-nkxj8		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:13 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-user-client		Requested	Created new CertificateRequest resource "endpoint-template-user-client-nkxj8"		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:14 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-user-client		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-0		ProvisioningSucceeded	Successfully provisioned volume pvc-dd6f0158-409c-419b-bfe1-fa1686cfcd87		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-1		ProvisioningSucceeded	Successfully provisioned volume pvc-480df756-64fd-4d71-9968-d7ec2088faa6		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	PersistentVolumeClaim datadir-endpoint-template-2		ProvisioningSucceeded	Successfully provisioned volume pvc-d9ab6a89-740e-4369-8b7d-6962fcc6022e		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	Pod endpoint-template-0		Scheduled	Successfully assigned kuttl-test-select-louse/endpoint-template-0 to kind-worker		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Warning	Pod endpoint-template-0		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxyca" : secret "endpoint-template-proxy-api-client" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Warning	Pod endpoint-template-0		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxycert" : secret "endpoint-template-proxy-api-node" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Normal	Pod endpoint-template-1		Scheduled	Successfully assigned kuttl-test-select-louse/endpoint-template-1 to kind-worker6		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Warning	Pod endpoint-template-1		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxycert" : secret "endpoint-template-proxy-api-node" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:15 +0000 UTC	Warning	Pod endpoint-template-1		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxyca" : secret "endpoint-template-proxy-api-client" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	Pod endpoint-template-2		Scheduled	Successfully assigned kuttl-test-select-louse/endpoint-template-2 to kind-worker8		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Warning	Pod endpoint-template-2		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxyca" : secret "endpoint-template-proxy-api-client" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Warning	Pod endpoint-template-2		FailedMount	MountVolume.SetUp failed for volume "tlspandaproxycert" : secret "endpoint-template-proxy-api-node" not found		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-client-c6tvl		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-client		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	CertificateRequest.cert-manager.io endpoint-template-proxy-api-node-nkxz5		CertificateIssued	Certificate fetched from issuer successfully		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	Certificate.cert-manager.io endpoint-template-proxy-api-node		Issuing	The certificate has been successfully issued		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:16 +0000 UTC	Normal	Issuer.cert-manager.io endpoint-template-proxy-root-issuer		KeyPairVerified	Signing CA verified		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:18 +0000 UTC	Normal	Pod endpoint-template-2.spec.initContainers{redpanda-configurator}		Pulled	Container image "localhost/configurator:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:18 +0000 UTC	Normal	Pod endpoint-template-2.spec.initContainers{redpanda-configurator}		Created	Created container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:18 +0000 UTC	Normal	Pod endpoint-template-2.spec.initContainers{redpanda-configurator}		Started	Started container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-0.spec.initContainers{redpanda-configurator}		Pulled	Container image "localhost/configurator:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-0.spec.initContainers{redpanda-configurator}		Created	Created container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-0.spec.initContainers{redpanda-configurator}		Started	Started container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-1.spec.initContainers{redpanda-configurator}		Pulled	Container image "localhost/configurator:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-1.spec.initContainers{redpanda-configurator}		Created	Created container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-1.spec.initContainers{redpanda-configurator}		Started	Started container redpanda-configurator		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-2.spec.containers{redpanda}		Pulled	Container image "localhost/redpanda:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-2.spec.containers{redpanda}		Created	Created container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:19 +0000 UTC	Normal	Pod endpoint-template-2.spec.containers{redpanda}		Started	Started container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-0.spec.containers{redpanda}		Pulled	Container image "localhost/redpanda:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-0.spec.containers{redpanda}		Created	Created container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-0.spec.containers{redpanda}		Started	Started container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-1.spec.containers{redpanda}		Pulled	Container image "localhost/redpanda:dev" already present on machine		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-1.spec.containers{redpanda}		Created	Created container redpanda		
    logger.go:42: 06:03:12 | endpoint-template | 2022-11-20 05:58:20 +0000 UTC	Normal	Pod endpoint-template-1.spec.containers{redpanda}		Started	Started container redpanda

RafalKorepta avatar Nov 21 '22 13:11 RafalKorepta

The kubectl get pod output shows the following problem

...
            {
              "lastProbeTime": null,
              "lastTransitionTime": "2022-11-20T05:58:15Z",
              "message": "containers with unready status: [redpanda]",
              "reason": "ContainersNotReady",
              "status": "False",
              "type": "Ready"
            },
            {
              "lastProbeTime": null,
              "lastTransitionTime": "2022-11-20T05:58:15Z",
              "message": "containers with unready status: [redpanda]",
              "reason": "ContainersNotReady",
              "status": "False",
              "type": "ContainersReady"
            }
          ],
          "containerStatuses": [
            {
              "image": "localhost/redpanda:dev",
              "imageID": "",
              "lastState": {},
              "name": "redpanda",
              "ready": false,
              "restartCount": 0,
              "started": false,
              "state": {
                "waiting": {
                  "reason": "PodInitializing"
                }
              }
            }
          ],
...

RafalKorepta avatar Nov 21 '22 13:11 RafalKorepta

Nov 20 06:05:15 kind-worker kubelet[258]: E1120 06:05:15.425494     258 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"redpanda\" with PostStartHookError: command '/scripts/postStart.sh' exited with 137: : \"Exec lifecycle hook ([/scripts/postStart.sh]) for Container \\\"redpanda\\\" in Pod \\\"endpoint-template-0_kuttl-test-select-louse(35d2ae0a-bd23-41f7-81a4-89ac99f99f4f)\\\" failed - error: command '/scripts/postStart.sh' exited with 137: , message: \\\"\\\"\"" pod="kuttl-test-select-louse/endpoint-template-0" podUID=35d2ae0a-bd23-41f7-81a4-89ac99f99f4f

For what ever reason the postStart script failed.

RafalKorepta avatar Nov 22 '22 08:11 RafalKorepta

Second time https://buildkite.com/redpanda/redpanda/builds/18895#01849d6c-9736-4275-b872-dd1116132d33 the same problem

--- FAIL: kuttl/harness/endpoint-template (601.47s)

RafalKorepta avatar Nov 22 '22 08:11 RafalKorepta

Third time https://buildkite.com/redpanda/redpanda/builds/18875#01849b55-2a33-426c-9267-ae7b4410d220 the same problem

From this test run one pod named ednpoint-template-1 couldn't finish PostStart hook. The following logs was gathered from the Redpanda container:

2022-11-21T19:11:14.036075263Z stderr F INFO  2022-11-21 19:11:14,035 [shard 0] cluster - cluster_discovery.cc:147 - Requesting node ID for node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261} from {host: endpoint-template-0.endpoint-template.kuttl-test-helped-gator.svc.cluster.local., port: 33145}
2022-11-21T19:11:14.036108999Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-0.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036312853Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
2022-11-21T19:11:14.036370634Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] rpc - transport.cc:308 - RPC Client probes: { requests_sent: 0, requests_pending: 0, requests_completed: 0, request_errors: 0, request_timeouts: 0, in_bytes: 0, out_bytes: 0, connects: 0, connections: 0, connection_errors: 1, read_dispatch_errors: 0, corrupted_headers: 0, server_correlation_errors: 0, client_correlation_errors: 0, requests_blocked_memory: 0 }
2022-11-21T19:11:14.036411964Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:169 - Error registering node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261}, retrying: std::__1::system_error (error C-Ares:4, endpoint-template-0.endpoint-template.kuttl-test-helped-gator.svc.cluster.local.: Not found)
2022-11-21T19:11:14.036415432Z stderr F INFO  2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:147 - Requesting node ID for node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261} from {host: endpoint-template-1.endpoint-template.kuttl-test-helped-gator.svc.cluster.local., port: 33145}
2022-11-21T19:11:14.036420704Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-1.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036604935Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
2022-11-21T19:11:14.036633878Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] rpc - transport.cc:308 - RPC Client probes: { requests_sent: 0, requests_pending: 0, requests_completed: 0, request_errors: 0, request_timeouts: 0, in_bytes: 0, out_bytes: 0, connects: 0, connections: 0, connection_errors: 1, read_dispatch_errors: 0, corrupted_headers: 0, server_correlation_errors: 0, client_correlation_errors: 0, requests_blocked_memory: 0 }
2022-11-21T19:11:14.036663665Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:169 - Error registering node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261}, retrying: std::__1::system_error (error C-Ares:4, endpoint-template-1.endpoint-template.kuttl-test-helped-gator.svc.cluster.local.: Not found)
2022-11-21T19:11:14.036678215Z stderr F INFO  2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:147 - Requesting node ID for node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261} from {host: endpoint-template-2.endpoint-template.kuttl-test-helped-gator.svc.cluster.local., port: 33145}
2022-11-21T19:11:14.036689774Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-2.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036817588Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
2022-11-21T19:11:14.036851303Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] rpc - transport.cc:308 - RPC Client probes: { requests_sent: 0, requests_pending: 0, requests_completed: 0, request_errors: 0, request_timeouts: 0, in_bytes: 0, out_bytes: 0, connects: 0, connections: 0, connection_errors: 1, read_dispatch_errors: 0, corrupted_headers: 0, server_correlation_errors: 0, client_correlation_errors: 0, requests_blocked_memory: 0 }
2022-11-21T19:11:14.03687066Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] cluster - cluster_discovery.cc:169 - Error registering node UUID {6d0bca35-3ff2-4a4e-b3ce-720843bf0261}, retrying: std::__1::system_error (error C-Ares:4, endpoint-template-2.endpoint-template.kuttl-test-helped-gator.svc.cluster.local.: Not found)

RafalKorepta avatar Nov 22 '22 08:11 RafalKorepta

The script that tries to remove maintenance mode

#!/usr/bin/env bash
set -e

until NODE_ID=$(curl --silent --fail http://${POD_NAME}.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local:9644/v1/node_config | grep -o '\"node_id\":[^,}]*' | grep -o '[^: ]*$'); do
	sleep 0.5
done
echo "Clearing maintenance mode on node ${NODE_ID}"
until [ "${status:-}" = "200" ] || [ "${status:-}" = "400" ]; do
	status=$(curl -X DELETE --silent -o /dev/null -w "%{http_code}" http://${POD_NAME}.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local:9644/v1/brokers/${NODE_ID}/maintenance)
	sleep 0.5
done

RafalKorepta avatar Nov 22 '22 13:11 RafalKorepta

Redpanda config

redpanda:
    data_directory: /var/lib/redpanda/data
    empty_seed_starts_cluster: false
    seed_servers:
        - host:
            address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
            port: 33145
        - host:
            address: endpoint-template-1.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
            port: 33145
        - host:
            address: endpoint-template-2.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
            port: 33145
    rpc_server:
        address: 0.0.0.0
        port: 33145
    kafka_api:
        - address: 0.0.0.0
          port: 9092
          name: kafka
          authentication_method: none
        - address: 0.0.0.0
          port: 9093
          name: kafka-external
          authentication_method: none
    kafka_api_tls:
        - name: kafka-external
          key_file: /etc/tls/certs/tls.key
          cert_file: /etc/tls/certs/tls.crt
          truststore_file: /etc/tls/certs/ca/ca.crt
          enabled: true
          require_client_auth: true
    admin:
        - address: 0.0.0.0
          port: 9644
          name: admin
    advertised_rpc_api:
        address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
        port: 33145
    advertised_kafka_api:
        - address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 9092
          name: kafka
        - address: 0-5c2a3f8f4c-kafka.example.com
          port: 30804
          name: kafka-external
    developer_mode: true
    auto_create_topics_enabled: true
    cloud_storage_segment_max_upload_interval_sec: 1800
    default_topic_replications: 3
    enable_rack_awareness: true
    fetch_reads_debounce_timeout: 10
    group_topic_partitions: 3
    id_allocator_replication: 3
    log_segment_size: 536870912
    storage_min_free_bytes: 10485760
    topic_partitions_per_shard: 1000
    transaction_coordinator_replication: 3
rpk:
    tune_network: true
    tune_disk_scheduler: true
    tune_disk_nomerges: true
    tune_disk_write_cache: true
    tune_disk_irq: true
    tune_cpu: true
    tune_aio_events: true
    tune_clocksource: true
    tune_swappiness: true
    coredump_dir: /var/lib/redpanda/coredump
    tune_ballast_file: true
    overprovisioned: true
pandaproxy:
    pandaproxy_api:
        - address: 0.0.0.0
          port: 8082
          name: proxy
        - address: 0.0.0.0
          port: 8083
          name: proxy-external
    pandaproxy_api_tls:
        - name: proxy-external
          key_file: /etc/tls/certs/pandaproxy/tls.key
          cert_file: /etc/tls/certs/pandaproxy/tls.crt
          truststore_file: /etc/tls/certs/pandaproxy/ca/ca.crt
          enabled: true
          require_client_auth: true
    advertised_pandaproxy_api:
        - address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 8082
          name: proxy
        - address: 0-5c2a3f8f4c-pandaproxy.example.com
          port: 32576
          name: proxy-external
pandaproxy_client:
    brokers:
        - address: endpoint-template-0.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 9092
        - address: endpoint-template-1.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 9092
        - address: endpoint-template-2.endpoint-template.kuttl-test-composed-jackass.svc.cluster.local.
          port: 9092
schema_registry: {}

RafalKorepta avatar Nov 22 '22 13:11 RafalKorepta

From Redpanda's perspective, a couple observations:

  • this node believes it isn't a cluster founder, meaning it either found that another node has a cluster UUID (only possible if we have previously been able to send RPCs), or it's not configured as a part of the seed_servers list
  • it looks like this is a brand new node that doesn't yet have a node ID (the code path hit means that we're trying to get assigned a node ID because we don't have one stored locally yet)

Just curious if that aligns with what we think the test is doing; is it adding a brand new node to a cluster? For a brand new node, is it possible to skip the maintenance mode disabling? More broadly speaking, if the issue is that we can't send any RPCs while initializing, that seems problematic, and maybe we need to expose a different readiness endpoint for these hooks to use.

2022-11-21T19:11:14.036108999Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-0.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036312853Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
...
2022-11-21T19:11:14.036420704Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-1.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036604935Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4
...
2022-11-21T19:11:14.036689774Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query name endpoint-template-2.endpoint-template.kuttl-test-helped-gator.svc.cluster.local. (INET)
2022-11-21T19:11:14.036817588Z stderr F DEBUG 2022-11-21 19:11:14,036 [shard 0] dns_resolver - Query failed: 4

It looks like the node is having trouble sending RPCs to any of the seed nodes. Is that something that we should be expecting when initializing a pod?

andrwng avatar Nov 22 '22 17:11 andrwng

once more in a feature branch: https://buildkite.com/redpanda/redpanda/builds/18931#0184a033-7ea9-4375-a819-226a353770d7

dlex avatar Nov 22 '22 18:11 dlex

Another thing I'm confused about is why this is only happening only sometimes, if the issue truly is that DNS setup cannot proceed until the node exposes its admin endpoint.

andrwng avatar Nov 22 '22 18:11 andrwng

Just for posterity, exited with 137 means that it was still trying with the pod was deleted.

joejulian avatar Nov 23 '22 00:11 joejulian

It looks like the node is having trouble sending RPCs to any of the seed nodes. Is that something that we should be expecting when initializing a pod?

Thanks @andrwng I will drill down with the DNS (core dns) pod if it was causing the problem.

Just curious if that aligns with what we think the test is doing; is it adding a brand new node to a cluster?

This tests is related to setting up the external advertised kafka api DNS using some template mechanism. It happens that this particular test will create 3 Redpanda cluster. Others tries to create only 1 Redpanda cluster.

For a brand new node, is it possible to skip the maintenance mode disabling?

It's not that easy. We would need to implement some hacks around when node is considered formed and when not. If we can not use Admin API reliably then I we might end up with some kind of workaround.

More broadly speaking, if the issue is that we can't send any RPCs while initializing, that seems problematic, and maybe we need to expose a different readiness endpoint for these hooks to use.

We are not using readiness probe to put Redpanda broker out of maintenance mode. It's PostStart hook https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks

RafalKorepta avatar Nov 23 '22 08:11 RafalKorepta

DNS containers doesn't have any useful logs. I don't want to blame kind cluster or buildkite environment, but I run out of options.

Maybe we have too many kind nodes, but it's a stretch hypothesis.

RafalKorepta avatar Nov 23 '22 14:11 RafalKorepta

Another occurrence: https://buildkite.com/redpanda/redpanda/builds/19160#0184bd5a-bcef-427d-86f6-47c3cb00947f/6-168

RafalKorepta avatar Nov 28 '22 14:11 RafalKorepta

Another: https://buildkite.com/redpanda/redpanda/builds/19308#0184cb41-789f-4675-95f1-ac85941c890e/1464-1492

RafalKorepta avatar Dec 01 '22 14:12 RafalKorepta

One more: https://buildkite.com/redpanda/redpanda/builds/20442#01856bee-53ee-4718-add0-9b42681a2f6c

dlex avatar Jan 01 '23 20:01 dlex

We need to collect previous logs from pods. All we can see is that pod/endpoint-template-1's initContainer was running at the time we collected logs.

joejulian avatar Jan 13 '23 00:01 joejulian

We need to collect previous logs from pods. All we can see is that pod/endpoint-template-1's initContainer was running at the time we collected logs.

can that be done automatically and added into the buildkite artifacts so that we can access them?

dotnwat avatar Jan 13 '23 02:01 dotnwat

That will be the next step when this next has someone working on it, yes.

joejulian avatar Jan 24 '23 17:01 joejulian

Based on 1 week of failures there was no failures in endpoint-template. I will close this issue. If it reappears I will re-open.

RafalKorepta avatar Feb 06 '23 17:02 RafalKorepta