crash-diagnostics
crash-diagnostics copied to clipboard
Crash-Diagnostics (Crashd) is a tool to help investigate, analyze, and troubleshoot unresponsive or crashed Kubernetes clusters.
### What is happening? While running the tests, the created `kind` clusters get added to the default location of the kube config `$HOME/.kube/config` and the current context set in the...
SSH is not always available or ideal, in many cases using a DaemonSet or Jobs to collect logs is more practical e.g. https://github.com/kubernetes-sigs/cluster-api/issues/3344
When Crashd captures files and saves them in the working directory, it currently overwrites existing files or keep adding files to the same working directory from previous run. This can...
Ensure all tests that use SSH container are parameterized with mounted volume, server name, and port
Currently when the SSH container is started for tests, it uses the same mounted volume location for all tests. This can cause some tests to fail especially when ran in...
As a script writer, I should be able to specify connectivity/retry parameters for kube-related commands. For instance, `kube_config` should be updated to support max-retries and timeout as follow: ```python kube_config(path="./kube/config",...
Currently the crash-diagnostics codebase uses [logrus](https://github.com/Sirupsen/logrus) which has some potential issues with goroutines. Upstream CAPI uses `logr` which is an interface based logging mechanism. Possibly switch to use `logr`
As a script developer, I should be able to specify required program used in the script. If the program is not found or not installed, the script should fail fast....
Currently, the code ignores and does not validate host keys during an SSH/SCP operation. While this allows Crashd scripts to run quietly, it can be viewed as a security issue...
There are two really useful text snpipets we can collect about etcd over ssh # Note these have to happen on CAPI Masters ## 1) results of etcd perf, making...