percona-helm-charts
percona-helm-charts copied to clipboard
PMM 3 Client brokes deploy in pxc-db v1.18.0 after upgrade
Summary
This bug totally brokes pmm integration after upgrade chart, because pmm client container going into an infinite loop without start. pmm-client:3.3.1 container fails to start into haproxy pod when deployed using pxc-db Helm chart version 1.18.0, due to the wrong key usage pmmserverkey into cluster-secret that force the pxc operator 1.18.0 to choose pmm2 version instead of pmm3 that use key pmmservertoken as described into documentation ⸻
Affected Components
• pxc-db Helm chart: v1.18.0 • pxc-operator Helm chart v1.18.0 • pmm-client: v3.3.1 (default) • Kubernetes: EKS v1.3x.x
⸻
Steps to Reproduce
- Deploy pxc operator 1.16.1 and deploy pxc-db 1.16.1 cluster with haproxy
- Upgrade pxc operator to 1.18.0 and upgrade pxc-db 1.18.0
- Observe logs from pmm-client sidecar in haproxy pods
Not tested but maybe bug triggered also without an upgrade but only setting pmmserverkey value ⸻
Expected Behavior
pmm-client should successfully going into a running state
⸻
Actual Behavior
The agent loops with:
pmm-agent is not running.
Config file /usr/local/percona/pmm2/config/pmm-agent.yaml is not writable: no such file or directory.
time="2025-10-07T15:31:43.912+00:00" level=info msg="'pmm-agent setup' exited with 1" component=entrypoint
⸻
Root Cause
pmm-client try to load a file that not exists because the pxc-operator inject wrong variable
PMM_AGENT_CONFIG_FILE: /usr/local/percona/pmm2/config/pmm-agent.yaml into haproxy statefulset, This behaviour is triggered from the parameter pmmserverkey that is the only available into cluster-secret
⸻
Proposed Fix
• Update pxc-db chart to inject also pmmservertoken value and make it as default • Add validation logic for PMM client version alignment with secretname and image version • Improve documentation for PMM compatibility
⸻
Impact
This issue causes pmm client container into PXC Cluster to not start anymore after un upgrade so basically you can broke your monitoring
https://perconadev.atlassian.net/browse/K8SPXC-1726
Hi, I did a fresh installation using the latest charts with secrets.passwords.pmmserverkey because if you leave out pmmserverkey from your helm values, it throws:
"error": "can't enable PMM2: either pmmserverkey key doesn't exist in the secrets, or secrets and internal secrets are out of sync"
Due to your mentioning I manually patched the internal secret (just seconds after installing the pxc-db helm chart) to also inject the token into the key pmmservertoken like this:
PMMKEY=glsa_slUzXewB1fsoTueZHNl7o3IoTxKw8i2m_6e185f9a
kubectl patch secret internal-pxc-pxc-db -n pxc --type merge -p "{\"data\":{\"pmmservertoken\":\"$(echo -n $PMMKEY | base64)\"}}"
pxc1 ~ > kubectl -n pxc get secret internal-pxc-pxc-db -o yaml
apiVersion: v1
data:
monitor: T0RJNUxXVmpPRFV0Tkc=
operator: T0RJNUxXVmpPRFV0Tkc=
pmmserverkey: Z2xzYV9zbFV6WGV3QjFmc29UdWVaSE5sN28zSW9UeEt3OGkybV82ZTE4NWY5YQ==
pmmservertoken: Z2xzYV9zbFV6WGV3QjFmc29UdWVaSE5sN28zSW9UeEt3OGkybV82ZTE4NWY5YQ==
proxyadmin: T0RJNUxXVmpPRFV0Tkc=
replication: T0RJNUxXVmpPRFV0Tkc=
root: T0RJNUxXVmpPRFV0Tkc=
xtrabackup: T0RJNUxXVmpPRFV0Tkc=
kind: Secret
I also had it in my values.yaml
secrets:
passwords:
root: "ODI5LWVjODUtNG"
xtrabackup: "ODI5LWVjODUtNG"
monitor: "ODI5LWVjODUtNG"
proxyadmin: "ODI5LWVjODUtNG"
operator: "ODI5LWVjODUtNG"
replication: "ODI5LWVjODUtNG"
pmmserverkey: "glsa_slUzXewB1fsoTueZHNl7o3IoTxKw8i2m_6e185f9a"
pmmservertoken: "glsa_slUzXewB1fsoTueZHNl7o3IoTxKw8i2m_6e185f9a"
but pmmservertoken got ignored completely due to the chart template for the secret.
Nonetheless, even after manual patching the secret, unfortunately the pmm-client container still thinks it's running PMM2 and throws
Config file /usr/local/percona/pmm2/config/pmm-agent.yaml is not writable: no such file or directory.
Because pmmserverkey must not exist.
I opened this issue https://github.com/percona/percona-xtradb-cluster-operator/issues/2224 before I found your jira issue and this here
Anyway I managed to get the pxc-db registered with pmm by the manual patch, after removing pmmserverkey from my values.yaml
So, it seems, not only fixing the helm chart template for the secret, but also the operator looking for the pmmserverkey is needed to solve the problem?
I'd happily help out by checking fresh installation routine for you, or anything, if it helps you
hi @sgohl in my analysis operator it's working already well and manage correctly the CRD and set PMM_AGENT_CONFIG_FILE env with the correct path based on existence of pmmserverkey or pmmservertoken value, it's only the chart that has the wrong template of CRD.