macos-UnifiedLogs Attempt at UnifiedLog Iterator

Basic PR to try to create an iterator for tracev3 files.

Allows caller to iterate through chunk(s) of tracev3. The original implementation of the parser, parsed all chunks at once and returned everything.

Now we offer a way to parse one chunk at a time. With the goal of reducing memory usage a little bit

Sep 24 '24 03:09 puffyCid

hey @jrouaix, i finally have time to try to improve this library a bit to reduce memory usage. Apologies for the delay

I see in ur PR #22 you are also making some changes.

I added a function to iterate through the log files in chunks. Below is very high-level overview of the log format:

Tracev3 log format:

header
gzip compressed block 
catalog <- Metadata on the logs
chunkset <— Collection of logs
chunkset
chunkset
chunkset
chunkset
End of block

Gzip compressed block
catalog
chunkset
chunkset
chunkset
chunkset
chunkset
chunkset
chunkset
End of block

<repeat XXXX times>

Originally the library would parse entire tracev3 file at once. Looping through and parsing and decompressing all data. Then return all log data for the user to then build (build_log)

This PR adds an iterator to now parse one gzip compressed chunk at a time This reduces memory usage a good bit. A ~150-200MB decrease on my test system

From the example projects in the repo: unifiedlog_parser - ~500MB - ~600MB - uses the original parser to parse entire tracev3 file at once unifiedlog_iterator - 400MB - uses the new iterator approach (example included in this PR)

Based on my tests it looks like most of the memory usage is coming from the strings we read and cache at /var/log/db/uuidtext. On my test system this directory contains ~350MBs of data.

I added an example project in this PR that uses the new iterator approach to parsing log files. If you have time to run/test it, i would be interested if you see a memory decrease.

Sep 24 '24 04:09 puffyCid

If you have time to run/test it, i would be interested if you see a memory decrease.

will do !

Sep 26 '24 15:09 jrouaix

Hi @puffyCid ! So I gave a try on this branch and indeed there is a very nice memory wise improvement.

On a standard run on our product, I can see around 200+ MB max memory used (over 900+, more than 20% but it's not all unified logs so it represent a bigger improvement on your lib only).

Yet, there is a pretty big perf regression when using this version : new version :

real    13m29.036s
user    13m56.268s
sys     1m18.550s

previous one :

real    9m57.806s
user    11m13.398s
sys     0m45.651s

I wonder if can be caused by the oversize_strings copying I just commented about previously : https://github.com/mandiant/macos-UnifiedLogs/pull/26#discussion_r1784626058

Oct 02 '24 16:10 jrouaix

Thanks for the benchmarks, I'll take a look and troubleshoot the performance

Oct 03 '24 23:10 puffyCid

@jrouaix I made some minor fixes to the unifiedlog_iterator example. And ran some basic benchmarks with hyperfine

Performance for unifiedlog_iterator and unifiedlog_parser examples appear to be nearly the same.

In the example below, system_logs.logarchive has ~9.6 million log entries

hyperfine "./unifiedlog_iterator -i system_logs.logarchive -o out.csv" "./unifiedlog_parser -i system_logs.logarchive -o out2.csv"
Benchmark 1: ./unifiedlog_iterator -i system_logs.logarchive -o out.csv
  Time (mean ± σ):     58.799 s ±  0.327 s    [User: 54.750 s, System: 3.507 s]
  Range (min … max):   58.203 s … 59.275 s    10 runs

Benchmark 2: ./unifiedlog_parser -i system_logs.logarchive -o out2.csv
  Time (mean ± σ):     57.804 s ±  0.129 s    [User: 53.366 s, System: 3.984 s]
  Range (min … max):   57.576 s … 57.949 s    10 runs

Summary
  ./unifiedlog_parser -i system_logs.logarchive -o out2.csv ran
    1.02 ± 0.01 times faster than ./unifiedlog_iterator -i system_logs.logarchive -o out.csv

Do you still see a performance regression?

Oct 08 '24 05:10 puffyCid

I'm so sorry @puffyCid, I didn't have much time in the previous week (won't be better in the next ones) If it's ok on your side, don't worry and merge, probably the pref regression is in our usage (expecting chunks or data, now are chunks of 1, could have a pretty big side effect !)

Nov 13 '24 12:11 jrouaix

@puffyCid I have more intel here, yet I can't find where/why it's slightly slower

same counts
same overall output in our product
new version saves a lot of RAM 🎉
yet the new version is about 4% slower than the old one (2s over 50s full process)

2 implementation on our product

  fn parse_file_new(&mut self, full_path: &str) {
    let Ok(log_bytes) = fs::read(full_path) else {
      return;
    };

    let log_iterator = macos_unifiedlogs::iterator::UnifiedLogIterator { data: log_bytes, header: Vec::new() };

    let mut missing_unified_log_data_vec = UnifiedLogData { header: Vec::new(), catalog_data: Vec::new(), oversize: Vec::new() };

    for mut chunk in log_iterator {
      chunk.oversize.append(&mut self.oversize_strings);
      let (results, mut missing_logs) = build_log(&chunk, &self.string_results, &self.shared_strings_results, &self.timesync_data, true);
      self.oversize_strings = chunk.oversize;

      self.logs_count += results.len();
      self.chuncks_count += 1;

      missing_unified_log_data_vec.header.append(&mut missing_logs.header);
      missing_unified_log_data_vec.catalog_data.append(&mut missing_logs.catalog_data);
      missing_unified_log_data_vec.oversize.append(&mut missing_logs.oversize);
    }

    self.leftover_logs.push(missing_unified_log_data_vec);
  }

the old one is

  fn parse_file(&mut self, full_path: &str) {
  let Ok(mut log_data) = parse_log(full_path).map_err(|e| tracing::error!("unified log parse_file error : {e}")) else {
    return;
  };

  log_data.oversize.append(&mut self.oversize_strings);

  // Get all constructed logs and any log data that failed to get constructed (exclude_missing = true)
  let Ok(iterator) = iter_log(&log_data, &self.string_results, &self.shared_strings_results, &self.timesync_data, true)
    .map_err(|e| tracing::warn!("Failed to iterate over logs: {e}"))
  else {
    return;
  };

  let mut missing_unified_log_data_vec = UnifiedLogData { header: Vec::new(), catalog_data: Vec::new(), oversize: Vec::new() };

  for (results, mut missing_logs) in iterator {
    // Track Oversize entries
    self.logs_count += results.len();
    self.chuncks_count += 1;

    missing_unified_log_data_vec.header.append(&mut missing_logs.header);
    missing_unified_log_data_vec.catalog_data.append(&mut missing_logs.catalog_data);
    missing_unified_log_data_vec.oversize.append(&mut missing_logs.oversize);
  }

  self.oversize_strings = log_data.oversize;
  self.leftover_logs.push(missing_unified_log_data_vec);
}

Nov 15 '24 11:11 jrouaix

@jrouaix let me see if can make some local benchmarks based on ur implementation

One interesting observation: It looks like ur appending data individually

missing_unified_log_data_vec.header.append(&mut missing_logs.header); missing_unified_log_data_vec.catalog_data.append(&mut missing_logs.catalog_data); missing_unified_log_data_vec.oversize.append(&mut missing_logs.oversize);

I think in the example I track missing data as a Vec<UnifiedlogData> `missing.push(missing_logs);'

Perhaps appending this data 3 times individually for every iteration (vs once per file in the original implementation) could be the cause of the regression?

As mentioned let me try to recreate this locally and see if I can replicate the performance regression

Nov 16 '24 04:11 puffyCid

Another idea, what if u check first for empty values first. If header, catalog_data, and oversize are all empty

Do not append to missing_unified_log_data_vec and just continue

Nov 16 '24 04:11 puffyCid