troubleshoot icon indicating copy to clipboard operation
troubleshoot copied to clipboard

Redacted data should be tokenized, not totally hidden

Open laverya opened this issue 3 years ago • 1 comments

Describe the rationale for the suggested feature.

It's often the case that knowing whether a request is failing to the IP of pod A or pod B, but all IPs are hidden. If IP A could be consistently distinguished from B, this could be determined with troubleshoot.

Describe the feature

Instead of replacing IPs with ***HIDDEN***, we could replace them with the hash of a per-run salt + the redacted text. https://github.com/replicatedhq/troubleshoot/blob/050f5939c6f4e90f7b3aea102ad96c261deb7be9/pkg/redact/redact.go#L16

This would require using something other than regex.replaceAllString, which may reduce performance. https://github.com/replicatedhq/troubleshoot/blob/050f5939c6f4e90f7b3aea102ad96c261deb7be9/pkg/redact/multi_line.go#L64

Describe alternatives you've considered

Reducing redaction of IPs would also solve this problem, but is difficult to do without data leakage.

Additional context

laverya avatar Oct 07 '21 18:10 laverya

This is something that frequently becomes important in my own troubleshooting process.

There's a big difference between "connection from HIDDEN to HIDDEN failed", and "connection from host1 to host2 failed" -- especially when there are hundreds of similar messages and I don't know if it's two hosts or two thousand hosts. per run salted hash would be fantastic to achieve this.

Bonus points if it could redact all IPs to rfc5737 ip ranges, but that's probably too big of a task and could be misleading in some cases.

programmerq avatar Nov 02 '21 20:11 programmerq