percona-helm-charts icon indicating copy to clipboard operation
percona-helm-charts copied to clipboard

PMM 3 Client brokes deploy in pxc-db v1.18.0 after upgrade

Open ynuyasha opened this issue 1 month ago • 3 comments

Summary

This bug totally brokes pmm integration after upgrade chart, because pmm client container going into an infinite loop without start. pmm-client:3.3.1 container fails to start into haproxy pod when deployed using pxc-db Helm chart version 1.18.0, due to the wrong key usage pmmserverkey into cluster-secret that force the pxc operator 1.18.0 to choose pmm2 version instead of pmm3 that use key pmmservertoken as described into documentation ⸻

Affected Components

• pxc-db Helm chart: v1.18.0 • pxc-operator Helm chart v1.18.0 • pmm-client: v3.3.1 (default) • Kubernetes: EKS v1.3x.x

Steps to Reproduce

  1. Deploy pxc operator 1.16.1 and deploy pxc-db 1.16.1 cluster with haproxy
  2. Upgrade pxc operator to 1.18.0 and upgrade pxc-db 1.18.0
  3. Observe logs from pmm-client sidecar in haproxy pods

Not tested but maybe bug triggered also without an upgrade but only setting pmmserverkey value ⸻

Expected Behavior

pmm-client should successfully going into a running state

Actual Behavior

The agent loops with: pmm-agent is not running.
Config file /usr/local/percona/pmm2/config/pmm-agent.yaml is not writable: no such file or directory.
time="2025-10-07T15:31:43.912+00:00" level=info msg="'pmm-agent setup' exited with 1" component=entrypoint

Root Cause

pmm-client try to load a file that not exists because the pxc-operator inject wrong variable PMM_AGENT_CONFIG_FILE: /usr/local/percona/pmm2/config/pmm-agent.yaml into haproxy statefulset, This behaviour is triggered from the parameter pmmserverkey that is the only available into cluster-secret

Proposed Fix

• Update pxc-db chart to inject also pmmservertoken value and make it as default • Add validation logic for PMM client version alignment with secretname and image version • Improve documentation for PMM compatibility

Impact

This issue causes pmm client container into PXC Cluster to not start anymore after un upgrade so basically you can broke your monitoring

ynuyasha avatar Oct 07 '25 16:10 ynuyasha

https://perconadev.atlassian.net/browse/K8SPXC-1726

ynuyasha avatar Oct 07 '25 16:10 ynuyasha

Hi, I did a fresh installation using the latest charts with secrets.passwords.pmmserverkey because if you leave out pmmserverkey from your helm values, it throws:

"error": "can't enable PMM2: either pmmserverkey key doesn't exist in the secrets, or secrets and internal secrets are out of sync"

Due to your mentioning I manually patched the internal secret (just seconds after installing the pxc-db helm chart) to also inject the token into the key pmmservertoken like this:

PMMKEY=glsa_slUzXewB1fsoTueZHNl7o3IoTxKw8i2m_6e185f9a
kubectl patch secret internal-pxc-pxc-db -n pxc --type merge   -p "{\"data\":{\"pmmservertoken\":\"$(echo -n $PMMKEY | base64)\"}}"

pxc1 ~ > kubectl -n pxc get secret internal-pxc-pxc-db -o yaml
apiVersion: v1
data:
  monitor: T0RJNUxXVmpPRFV0Tkc=
  operator: T0RJNUxXVmpPRFV0Tkc=
  pmmserverkey: Z2xzYV9zbFV6WGV3QjFmc29UdWVaSE5sN28zSW9UeEt3OGkybV82ZTE4NWY5YQ==
  pmmservertoken: Z2xzYV9zbFV6WGV3QjFmc29UdWVaSE5sN28zSW9UeEt3OGkybV82ZTE4NWY5YQ==
  proxyadmin: T0RJNUxXVmpPRFV0Tkc=
  replication: T0RJNUxXVmpPRFV0Tkc=
  root: T0RJNUxXVmpPRFV0Tkc=
  xtrabackup: T0RJNUxXVmpPRFV0Tkc=
kind: Secret

I also had it in my values.yaml

secrets:
  passwords:
    root: "ODI5LWVjODUtNG"
    xtrabackup: "ODI5LWVjODUtNG"
    monitor: "ODI5LWVjODUtNG"
    proxyadmin: "ODI5LWVjODUtNG"
    operator: "ODI5LWVjODUtNG"
    replication: "ODI5LWVjODUtNG"
    pmmserverkey: "glsa_slUzXewB1fsoTueZHNl7o3IoTxKw8i2m_6e185f9a"
    pmmservertoken: "glsa_slUzXewB1fsoTueZHNl7o3IoTxKw8i2m_6e185f9a"

but pmmservertoken got ignored completely due to the chart template for the secret.

Nonetheless, even after manual patching the secret, unfortunately the pmm-client container still thinks it's running PMM2 and throws

Config file /usr/local/percona/pmm2/config/pmm-agent.yaml is not writable: no such file or directory.

Because pmmserverkey must not exist.

I opened this issue https://github.com/percona/percona-xtradb-cluster-operator/issues/2224 before I found your jira issue and this here

Anyway I managed to get the pxc-db registered with pmm by the manual patch, after removing pmmserverkey from my values.yaml

So, it seems, not only fixing the helm chart template for the secret, but also the operator looking for the pmmserverkey is needed to solve the problem?

I'd happily help out by checking fresh installation routine for you, or anything, if it helps you

sgohl avatar Oct 30 '25 10:10 sgohl

hi @sgohl in my analysis operator it's working already well and manage correctly the CRD and set PMM_AGENT_CONFIG_FILE env with the correct path based on existence of pmmserverkey or pmmservertoken value, it's only the chart that has the wrong template of CRD.

ynuyasha avatar Oct 31 '25 16:10 ynuyasha