troubleshoot icon indicating copy to clipboard operation
troubleshoot copied to clipboard

Add flexibility to collectors & analysers to allow running external programs

Open banjoh opened this issue 2 years ago • 4 comments

Describe the rationale for the suggested feature.

In an effort of supporting extensibility of using support-bundle & preflight binaries, it would be good to be able to run other available programs in a system without only relying on stdout/stderr/exit codes. Some example use cases that come to mind are

  • Allowing cluster administrators that have home-built tools that collect and analyse data from hosts or k8s clusters to embed these tools without the need of rewriting them as specs to fit troubleshoot's current model.
  • Allow embedding in automation pipelines e.g CI which come with a plethora of tools

Describe the feature

The "how" part is still open, but here is a suggestion that has been touched on in a community meeting (notes can be found here) and other various discussions.

Here is my fictional collector that collects audit event logs, enriches them with user data and stores the output in $WORKSPACE_OUTPUT. $WORKSPACE_OUTPUT is a unique directory created by the framework for this collector instance. Contents are them copied over to the bundle once executing the collector completes.

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: run
spec:
  hostCollectors:
    - run:
        collectorName: "enriched-audit-logs"
        command: "python3"
        args: ["--timeout", "10m", "--output-dir", "$WORKSPACE_OUTPUT"]
        # Arbitrary parameters which get stored in on disk as YAML/JSON and fed passed
        # on to the command via a $CONFIG env
        config:   # perhaps call it "params"?
          username: postgres
          password: <my-pass>
          dbHost: <hostname>
          map:
            key: value
          list:
            - val1
            - val2
  • Create a similar analyser which takes in arbitrary parameters to operate on

Open questions

  • How would this look like in-cluster? Extending the run pod collector?
  • Is this the only way to allow extensibility?

Inspirations

  • https://k9scli.io/topics/plugins/
  • https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-environment-variable - GH action runners passes environment variables to jobs for them to store information that persists across jobs.

banjoh avatar Mar 24 '23 18:03 banjoh

The plugin interface should define standard way of retrieving kubeconfig. As support-bundle supports reading kubeconfig from KUBECONFIG env variable or passed as an CLI argument. If its passed as an argument then plugin would have no way of retrieving this kubeconfig path. Maybe it should say that it is guaranteed that KUBECONFIG is populated?

mhrabovcin avatar Mar 27 '23 13:03 mhrabovcin

The plugin interface should define standard way of retrieving kubeconfig.

Very correct.KUBECONFIG and a few other variables will be part of the constants passed to plugins. Here's non-exhaustive list

  • KUBECONFIG - how to connect to the cluster
  • Config parameters file e.g PLUGIN_CONFIG. We might want to provide a parameter (json|yaml) to define how the file format on disk
  • WORKSPACE - a place the plugin can run on. It would the plugins $CWD on launch. WORKSPACE/output can then contain all collected files. TBD ....

Inspiration from helm: https://helm.sh/docs/topics/plugins/#environment-variables

banjoh avatar Mar 27 '23 16:03 banjoh

Addressed by https://github.com/replicatedhq/troubleshoot/pull/1376

banjoh avatar Nov 16 '23 13:11 banjoh

Reopening since we might want to put in some thought on the analyser side of things. The ability to inject custom analyser logic without needing to write new analysers is worth considering.

banjoh avatar Nov 17 '23 16:11 banjoh