troubleshoot
troubleshoot copied to clipboard
Make Troubleshoot accept multiple specs
Per the design in https://github.com/replicatedhq/troubleshoot/pull/662, make the code changes to Troubleshoot to support multiple specs during a single run.
- define a new interface that provides the merge functionality for collectors, analyzers, and redactors
- there should be a generic implementation of that interface which is used by default for all objects. This can simply use
append(). - specific collectors may have alternative implementations of the interface where overrides are required and should have a way to have the collector specific function called instead.
Particular overrides known at this point in time (for example only not to be implemented now):
- Longhorn and Ceph have a namespace config, if multiple collectors are specified with different namespaces then each should be collected
- clusterResources has an option namespaces config, these should be merged and deduplicated
- Two runPod collectors with the same name and different commands are unmergable. Only one should be run and an error logged.
- Two configMap collectors can be deduplicated and merged depending on the configurations
- Two logs collectors likely only need deduplication to prevent collecting the same thing twice
Definition of done:
- Code path to handle the generic merge with entry point for collector specific implementations provided
- Unit tests that verify the generic merge operation works
- Implement one collector specific override
- Unit test that verifies the override works as expected
- Design doc either in the repo or on troubleshoot.sh to introduce developers to how to write a custom merge for a collector
Some initial work on moving to a struct per interface similar to Host Collectors is here - https://github.com/replicatedhq/troubleshoot/pull/670
Wasn't able to get as much of a complete thought in as I wanted. Initially had some trouble deciding how I wanted to structure things. Feel free to take it in a completely different direction as whoever picks it up sees fit.
And had a question that I had been thinking on... Now that we've agreed there will be a simple merge/combining/dedup when aggregating all the different specs from different sources somewhere around here, does it make sense to move towards something like the following for the sake of collector level merging:
apiVersion: troubleshoot.sh/v1beta2
kind: SupporBundle
metadata:
name: bundle
spec:
collectors:
logs:
- collectorName: kube-system-logs
...
- collectorName: contour-logs
...
runPod:
- collectorName: ping-google
...
- collectorName: ping-replicated-app
...
This structures the spec a bit more so each collector can easily cherry pick exactly what it needs to worry about merging. Otherwise, I think each merge function for a collector would need to read the entire support bundle spec?
We should also define the benefits of the collector level merging because now that we've decided that specs will be aggregated/merged early in the process, I think that alone solves the initial problem raised by this proposal.
The structure you're showing for having one high level key per collector and just having all collectors of that type under it seems perfectly reasonable. It's not something we can enforce at the Spec level but I presume you're talking about having troubleshoot collect and merge everything into a master spec of this format. At that point you've already done the "just append everything" merge and as you said we don't need a 'default merge' for each collector. At this stage we just need to give each collector the option to further merge it's own section. And the fact that we'll already all collectors of a given type in a single list that should be very easy to do.
Assuming I'm reading that correctly, I agree with what you're saying. The only part I would probably expand on a little is that the initial merge doesn't completely address this issue. I think at least 1 collector needs to implement the optional 2nd stage merge so that we can confirm the codepath works and provide an example for how other collectors can do it.
@danj-replicated does the above make since to you? I think it's likely best if you pick this up from next week if you want to read through I can provide some additional context so you're set to make progress next week.
The first step here might be as simple as adding a merge interface, some merge logic in one or more implementations of that interface, and updating func runTroubleshoot(v *viper.Viper, arg string) error { to cope with arg being []string, then using the merge on it. Or the other way round, whichever works and is the least code changes for kots (which uses
Let's focus on what's needed for the minimum code changes to accept multiple, say, yaml files via the CLI, then work through what's needed for other input types (CRDs likely in the future, but not immediately).
Note also that kots does a bunch of merging, as well as storing the specs in K8s secrets, so there may be some helpful patterns to use.
So I thought I'd start approaching this from where @xavpaice mentioned. swapping out where we load the support bundle from a file with a loop over the cli arguments using the existing loading code so we don't lose any functionality. My plan is to create a "main" supportBundle, and then as we load more bundles from files, combine them into this primary struct, then hand that off to the rest of the application in the normal way as if nothing weird happened.
This "combining" logic could be expanded on to integrate more specialised merges later down the line.
so my first stab at this is available in https://github.com/replicatedhq/troubleshoot/tree/danj-multiple-inputs
naming conventions are open to suggestion.
Updated requirement (in description):
- [ ] CLI to accept multiple spec inputs for Collectors and Analyzers (excluded: Redactors)
- [ ] Troubleshoot to append each spec provided to make one, aggregated spec
- [ ] Troubleshoot to deduplicate ClusterResources and ClusterInfo
- [ ] Documentation for how to add deduplication for other collector types
- [ ] changes include tests
I'm now at a place where I'd want to start discussing with @diamonwiggins how to integrate our changes.
The current structure of collectors makes it unweildy to check for collector type and apply specific merges so I'm hoping that we can move towards having a merge method for a common collector interface.
Just noting that a portion of this was split off into Switch Collector from struct to interface which descopes the collector specific merge implementation and testing into that task. This one only need to document how the merge is expecting to introspect the to-be-created type/interface.
I question why we are deduplicating 2 specific collectors as part of this change. @danj-replicated is there something special about these two? Is there a reason we don't land this w/o deduplication and use those as the first example of a collector specific merge logic?
@chris-sanders nope, I only picked those as an example. when I get around to tidying this up and resolving the requested changes, I can remove these
Ok I'm going to remove that line from the Definition of Done then, I just wanted to check before changing it.
Reopened - TODO some kind of docs, or agree to have those drafted outside of the code repo and ready for the implementation of #680
Discussed docs, for the purpose of this task we can consider docs in the CLI help text to be enough, along with some minor updates on https://troubleshoot.sh/docs.
Having discussed docs further we agreed the CLI help text is adequate as is, and the troubleshoot.sh docs need a more significant overhaul regarding CLI usage than we can do in scope here. As such, I'm closing this task.