troubleshoot icon indicating copy to clipboard operation
troubleshoot copied to clipboard

It's hard to build a succinct regex log analyzer if a crashing pod's previous pod logs are collected

Open dexhorthy opened this issue 3 years ago • 4 comments

Describe the rationale for the suggested feature.

When collecting a bundle for a crashing pod, a top use case for troubleshoot is to be able to analyze a pod's logs to search for errors that might explain the crashloop. But troubleshoot is great at collecting the kubectl logs -p output, the logs from the previous crash. Because of how file path globbing works.

For example, if we have these logs in the bundle

x support-bundle-2022-09-19T19_41_59/pod-logs/my-app-59b8d785c7-wrv5s-previous.log
x support-bundle-2022-09-19T19_41_59/pod-logs/my-app-59b8d785c7-wrv5s.log

and this analyzer:

    - textAnalyze:
        checkName: No Errors in Logs
        fileName: pod-logs/my-app-*.log
        regex: 'prevent-start'
        outcomes:
          - pass:
              when: "false"
              message: "No concerning logs found"
          - fail:
              when: "true"
              message: |
                There's an error in startup -- if you forgot to prevent preventing starting, try removing the 'com.example.myapp/prevent-start' annotation

we get two of the same analyzer insight:

Screen Shot 2022-09-21 at 7 53 59 PM Screen Shot 2022-09-21 at 8 11 24 PM

Describe the feature

Ideally we analyze the previous logs if they're there, otherwise collect the current ones. Unless the pod has been Running for a long time (hours or days) and is no longer in a CrashLoopBackOff -- then we don't care about the previous logs and would rather analyze the running pod.

In the short term, it would be great to be able to use a proper regex in fileName so we could at least choose one log output or the other and avoid having two.

dexhorthy avatar Sep 22 '22 01:09 dexhorthy

Probably a pair of feature requests in this one:

  • regex for the fileName selector
  • more intelligent analysis of current and previous logs - which needs a lot more definition

During support cases we often find it's useful to look at current and previous logs, however whether an analyzer should look at either or both is quite situation dependent. I think, at this stage, it might be best to either have a regex like you suggest, or some kind of exclude which runs after the include, and is well documented so that folks can understand it. That gets hard if there's multiple specs and they conflict.

At this stage, I'm inclined to suggest we look at whether using regex is possible. Currently we use filepath.Glob, finding files using a regex is going to add some code, and we currently use the same file finder for many analyzers.

@dexhorthy what about using pod-logs/my-app-*[!previous].log for the file path, which complies with glob rules and excludes the previous logs?

xavpaice avatar Sep 23 '22 03:09 xavpaice

just tested that glob pattern with Go and it doesn't work the same way as in a shell.

We could consider switching from filepath.Glob to some other package which does support the exclusion option, if we can find a current one.

xavpaice avatar Sep 23 '22 03:09 xavpaice

Thanks for looking into this!

just tested that glob pattern with Go and it doesn't work the same way as in a shell.

I tried the same to no avail 😞

dexhorthy avatar Sep 26 '22 13:09 dexhorthy

Option:

Add a excludeFiles option to textAnalyze. This option is a glob which is expanded by Troubleshoot to a list of files to exclude from fileName.

We would then subtract the map returned by filepath.Glob for excludeFiles from the map returned by filepath.Glob from fileName. We could make a simple diff function like this (not my code:

package main

import "fmt"

// difference returns the elements in `a` that aren't in `b`.
func difference(a, b []string) []string {
    mb := make(map[string]struct{}, len(b))
    for _, x := range b {
        mb[x] = struct{}{}
    }
    var diff []string
    for _, x := range a {
        if _, found := mb[x]; !found {
            diff = append(diff, x)
        }
    }
    return diff
}

func main() {
    slice1 := []string{"foo", "bar","hello"}
    slice2 := []string{"foo", "bar"}
    fmt.Println(difference(slice1, slice2))
}

We would need to make this modification approximately here in getChildFileContents, and maybe in preflight too. See https://github.com/replicatedhq/troubleshoot/blob/44ae409081a09d007904bfe9f3855263c7f6d793/pkg/analyze/analyzer.go where results, err := analyzeTextAnalyze(analyzer.TextAnalyze, findFiles) where findFiles is type getChildCollectedFileContents. We would want to add something to that and potentially consider other analyzsers too, since a new param on getChildFileContents is not going to be optional though could be nil.

xavpaice avatar Sep 30 '22 03:09 xavpaice