troubleshoot
troubleshoot copied to clipboard
Redacted data should be tokenized, not totally hidden
Describe the rationale for the suggested feature.
It's often the case that knowing whether a request is failing to the IP of pod A or pod B, but all IPs are hidden. If IP A could be consistently distinguished from B, this could be determined with troubleshoot.
Describe the feature
Instead of replacing IPs with ***HIDDEN***
, we could replace them with the hash of a per-run salt + the redacted text. https://github.com/replicatedhq/troubleshoot/blob/050f5939c6f4e90f7b3aea102ad96c261deb7be9/pkg/redact/redact.go#L16
This would require using something other than regex.replaceAllString, which may reduce performance. https://github.com/replicatedhq/troubleshoot/blob/050f5939c6f4e90f7b3aea102ad96c261deb7be9/pkg/redact/multi_line.go#L64
Describe alternatives you've considered
Reducing redaction of IPs would also solve this problem, but is difficult to do without data leakage.
Additional context
This is something that frequently becomes important in my own troubleshooting process.
There's a big difference between "connection from HIDDEN to HIDDEN failed", and "connection from host1 to host2 failed" -- especially when there are hundreds of similar messages and I don't know if it's two hosts or two thousand hosts. per run salted hash would be fantastic to achieve this.
Bonus points if it could redact all IPs to rfc5737 ip ranges, but that's probably too big of a task and could be misleading in some cases.