otel-arrow icon indicating copy to clipboard operation
otel-arrow copied to clipboard

Improve the obfuscation process to retain the original patterns in logs.

Open lquerel opened this issue 1 year ago • 0 comments

The current obfuscation process preserves the length of the original text but doesn't maintain the patterns commonly found in logs. Consider the following logs as an example:

12:30:45 server 123 received http post request
12:31:23 server stopped with error "internal error"
12:31:24 server started
12:31:24 server 123 received http get request
12:31:24 server 456 received http get request

A log entry such as 12:30:45 server 123 received http post request likely results from a printf (or similar) function with the format string "%s server %d received http %s request", using three parameters: timestamp, server id, and http method. The last two log entries follow the same pattern.

We seek an obfuscation method that retains these patterns while adhering to privacy and security constraints.

A potential approach is to split the log entry based on separator characters (e.g., spaces, commas, colons, dots), then obfuscate individual words, and finally reassemble them with the separators. So, instead of completely obfuscated logs like:

12:30:45 server 123 received http post request --> 34DF32dfgre0943tlkfgj0934tjlkjg09u34ldfklg
12:31:24 server 123 received http get request  --> 6u7kdjfhwnsd09wrjklsdmmw35-fd023;lks-56

We might get:

sdf4dv4l 34ft8o 785 qw4532 8ghj ywe4 lyt764
7l:d3:0k 34ft8o 456 qw4532 8ghj 6hy lyt764
  -  -   ------     -----------     -------       <-- preserved patterns

lquerel avatar Oct 10 '23 18:10 lquerel