Attempt at UnifiedLog Iterator
Basic PR to try to create an iterator for tracev3 files.
Allows caller to iterate through chunk(s) of tracev3. The original implementation of the parser, parsed all chunks at once and returned everything.
Now we offer a way to parse one chunk at a time. With the goal of reducing memory usage a little bit
hey @jrouaix, i finally have time to try to improve this library a bit to reduce memory usage. Apologies for the delay
I see in ur PR #22 you are also making some changes.
I added a function to iterate through the log files in chunks. Below is very high-level overview of the log format:
Tracev3 log format:
header
gzip compressed block
catalog <- Metadata on the logs
chunkset <— Collection of logs
chunkset
chunkset
chunkset
chunkset
End of block
Gzip compressed block
catalog
chunkset
chunkset
chunkset
chunkset
chunkset
chunkset
chunkset
End of block
<repeat XXXX times>
Originally the library would parse entire tracev3 file at once. Looping through and parsing and decompressing all data. Then return all log data for the user to then build (build_log)
This PR adds an iterator to now parse one gzip compressed chunk at a time This reduces memory usage a good bit. A ~150-200MB decrease on my test system
From the example projects in the repo:
unifiedlog_parser - ~500MB - ~600MB - uses the original parser to parse entire tracev3 file at once
unifiedlog_iterator - 400MB - uses the new iterator approach (example included in this PR)
Based on my tests it looks like most of the memory usage is coming from the strings we read and cache at /var/log/db/uuidtext. On my test system this directory contains ~350MBs of data.
I added an example project in this PR that uses the new iterator approach to parsing log files. If you have time to run/test it, i would be interested if you see a memory decrease.
If you have time to run/test it, i would be interested if you see a memory decrease.
will do !
Hi @puffyCid ! So I gave a try on this branch and indeed there is a very nice memory wise improvement.
On a standard run on our product, I can see around 200+ MB max memory used (over 900+, more than 20% but it's not all unified logs so it represent a bigger improvement on your lib only).
Yet, there is a pretty big perf regression when using this version : new version :
real 13m29.036s
user 13m56.268s
sys 1m18.550s
previous one :
real 9m57.806s
user 11m13.398s
sys 0m45.651s
I wonder if can be caused by the oversize_strings copying I just commented about previously : https://github.com/mandiant/macos-UnifiedLogs/pull/26#discussion_r1784626058
Thanks for the benchmarks, I'll take a look and troubleshoot the performance
@jrouaix I made some minor fixes to the unifiedlog_iterator example. And ran some basic benchmarks with hyperfine
Performance for unifiedlog_iterator and unifiedlog_parser examples appear to be nearly the same.
In the example below, system_logs.logarchive has ~9.6 million log entries
hyperfine "./unifiedlog_iterator -i system_logs.logarchive -o out.csv" "./unifiedlog_parser -i system_logs.logarchive -o out2.csv"
Benchmark 1: ./unifiedlog_iterator -i system_logs.logarchive -o out.csv
Time (mean ± σ): 58.799 s ± 0.327 s [User: 54.750 s, System: 3.507 s]
Range (min … max): 58.203 s … 59.275 s 10 runs
Benchmark 2: ./unifiedlog_parser -i system_logs.logarchive -o out2.csv
Time (mean ± σ): 57.804 s ± 0.129 s [User: 53.366 s, System: 3.984 s]
Range (min … max): 57.576 s … 57.949 s 10 runs
Summary
./unifiedlog_parser -i system_logs.logarchive -o out2.csv ran
1.02 ± 0.01 times faster than ./unifiedlog_iterator -i system_logs.logarchive -o out.csv
Do you still see a performance regression?
I'm so sorry @puffyCid, I didn't have much time in the previous week (won't be better in the next ones) If it's ok on your side, don't worry and merge, probably the pref regression is in our usage (expecting chunks or data, now are chunks of 1, could have a pretty big side effect !)
@puffyCid I have more intel here, yet I can't find where/why it's slightly slower
- same counts
- same overall output in our product
- new version saves a lot of RAM 🎉
- yet the new version is about 4% slower than the old one (2s over 50s full process)
2 implementation on our product
fn parse_file_new(&mut self, full_path: &str) {
let Ok(log_bytes) = fs::read(full_path) else {
return;
};
let log_iterator = macos_unifiedlogs::iterator::UnifiedLogIterator { data: log_bytes, header: Vec::new() };
let mut missing_unified_log_data_vec = UnifiedLogData { header: Vec::new(), catalog_data: Vec::new(), oversize: Vec::new() };
for mut chunk in log_iterator {
chunk.oversize.append(&mut self.oversize_strings);
let (results, mut missing_logs) = build_log(&chunk, &self.string_results, &self.shared_strings_results, &self.timesync_data, true);
self.oversize_strings = chunk.oversize;
self.logs_count += results.len();
self.chuncks_count += 1;
missing_unified_log_data_vec.header.append(&mut missing_logs.header);
missing_unified_log_data_vec.catalog_data.append(&mut missing_logs.catalog_data);
missing_unified_log_data_vec.oversize.append(&mut missing_logs.oversize);
}
self.leftover_logs.push(missing_unified_log_data_vec);
}
the old one is
fn parse_file(&mut self, full_path: &str) {
let Ok(mut log_data) = parse_log(full_path).map_err(|e| tracing::error!("unified log parse_file error : {e}")) else {
return;
};
log_data.oversize.append(&mut self.oversize_strings);
// Get all constructed logs and any log data that failed to get constructed (exclude_missing = true)
let Ok(iterator) = iter_log(&log_data, &self.string_results, &self.shared_strings_results, &self.timesync_data, true)
.map_err(|e| tracing::warn!("Failed to iterate over logs: {e}"))
else {
return;
};
let mut missing_unified_log_data_vec = UnifiedLogData { header: Vec::new(), catalog_data: Vec::new(), oversize: Vec::new() };
for (results, mut missing_logs) in iterator {
// Track Oversize entries
self.logs_count += results.len();
self.chuncks_count += 1;
missing_unified_log_data_vec.header.append(&mut missing_logs.header);
missing_unified_log_data_vec.catalog_data.append(&mut missing_logs.catalog_data);
missing_unified_log_data_vec.oversize.append(&mut missing_logs.oversize);
}
self.oversize_strings = log_data.oversize;
self.leftover_logs.push(missing_unified_log_data_vec);
}
@jrouaix let me see if can make some local benchmarks based on ur implementation
One interesting observation: It looks like ur appending data individually
missing_unified_log_data_vec.header.append(&mut missing_logs.header); missing_unified_log_data_vec.catalog_data.append(&mut missing_logs.catalog_data); missing_unified_log_data_vec.oversize.append(&mut missing_logs.oversize);
I think in the example I track missing data as a Vec<UnifiedlogData>
`missing.push(missing_logs);'
Perhaps appending this data 3 times individually for every iteration (vs once per file in the original implementation) could be the cause of the regression?
As mentioned let me try to recreate this locally and see if I can replicate the performance regression
Another idea, what if u check first for empty values first. If header, catalog_data, and oversize are all empty
Do not append to missing_unified_log_data_vec and just continue