redpanda-operator icon indicating copy to clipboard operation
redpanda-operator copied to clipboard

rpk: provide alternative to ClusterRoleBinding for `rpk debug bundle` in k8s environments

Open r-vasquez opened this issue 1 year ago • 2 comments

Who is this for and what problem do they have today?

Currently, rpk relies on having a ClusterRole to collect the information needed for the debug bundle, see:

https://docs.redpanda.com/current/manage/kubernetes/troubleshooting/k-diagnostics-bundle/#generate-a-diagnostics-bundle

This is done to:

  1. Discover the admin API addresses of the cluster, currently, there is no way to do that. (See https://github.com/redpanda-data/redpanda/issues/8975).
  2. Collect the Logs of every pod in the cluster, this saves time in large clusters since the user only has to create one bundle instead of n-bundles.
  3. Collect k8s resources in the Redpanda namespace, for debugging.

Alternatives discussed:

This issue is to track the discussion, but the alternatives discussed are:

  • Use kubeconfig to authenticate, the bundle would have to be fired off from the debugger's machine, read the kubeconfig, and authenticate. This will allow collection of the Logs and Resources, but it has its limitations regarding the Admin API calls.
  • Use RoleBinding, so it stays in the namespace.

JIRA Link: CORE-2649

r-vasquez avatar Apr 24 '24 20:04 r-vasquez

One of the alternatives discussed with @chrisseto is to:

  1. Authenticate from an out-of-cluster client (rpk) to the k8s API using the kubeconfig file, Example: https://github.com/kubernetes/client-go/tree/master/examples/out-of-cluster-client-configuration
  2. Grab all the information needed from the k8s API, in this case: a. Logs b. Discover all pods in the cluster and get the Admin API addresses. c. Resources for debugging (current list here)
  3. From the out-of-cluster client, call every pod and execute a modified rpk debug bundle in each redpanda container. a. The modified command will need to be able to receive the previously-gathered k8s info. b. The modified command will need to return the bundle information in a way that the caller can read it. c. To execute a command, we can use: https://github.com/kubernetes/client-go/tree/master/tools/remotecommand
  4. The out-of-cluster client will have to stitch the bundles obtained from each pod.

This is a major refactor of the current way that the command works and will likely need an RFC first.

Please be aware that some of these changes are being made to overcome current limitations:

  • To discover admin API addresses:
    • https://github.com/redpanda-data/redpanda/issues/8975
    • https://github.com/redpanda-data/redpanda/issues/8972
  • Being able to store logs on disk on k8s environments. (Not yet discussed with Core)

r-vasquez avatar Jun 17 '24 20:06 r-vasquez

Moving from Core redpanda to redpanda-operator

david-yu avatar Apr 14 '25 17:04 david-yu