macos-UnifiedLogs oxidize part 1 ... an Intro to an ambitious vNext ?

Hi,

I was looking to implement a "simple" LogDataIterator but I couldn't make sens about all the &mut structs that are passed around just to pick some value in it when the function exit. => https://github.com/shindan-io/macos-UnifiedLogs/commit/2a6e543009de28040b96003723ffd435e6addcc1

And so while digging I found a lot to improve in the current code base and started some big rewrites.

avoid mut
better nom usage => a lot shorter code
avoid &ref on Copy types
const when consts

I really think the code could be way more maintainable, and it could help a lot to implement other features, or other apis/usages around this lib.

Would you have a look at this PR ? If you like it, I'll have a lot more to provide.

I hope I didn't do any regression, but I don't have the same test data to ensure.

We (myself & @mrguiman) think we can really help improving this lib (memory consumption & perhaps some perf). And we (Shindan) rely on it, so we have a green light from our employer to contribute.

To you have a plan to version 2 ?

Perhaps it could be a good moment to think about a roadmap, we could share the heavy lifting on adding features while also oxidize (aka : make the code Rusty) the rest of the existing code.

let us know @puffyCid

Aug 20 '24 13:08 jrouaix

added some more : dsc & uuidtext

Aug 23 '24 11:08 jrouaix

thanks @jrouaix for the PR. I would like to merge #20 first before doing an in-depth review. #20 Adds some CI support which hopefully will catch any possible regressions (fingers crossed).

Couple comments/questions from brief look:

[test_data] feature. Was this added due to the confusion on tests? As mentioned in BUILDING.md. The tests.zip file can be downloaded from the GitHub release page and extracted to your cloned repo. You should be able to then run all the tests. A macOS systems is required to run some. I think ideally all tests should be run by default. They should not take long. However, if you want, i think perhaps changing [test_data] to something like [test_live_system] and adding that to the tests:

test_collect_strings_system()
test_collect_timesync_system()
test_collect_shared_strings_system()

Could be useful. These test require a macOS system, hiding these behind a feature flag would be helpful for Linux and Windows users who try to run the tests 2. The conversion from let to const for some variables. Is this mainly, a change for readability (using uppercase variable names)? 3. Avoiding &ref on Copy types. I usually, pass by reference when possible. Just curious on the change to Copy. Is that again a readability change or just trying to make the code more Rusty? 4. Adding the anyhow dependency. I see this only for tests? Just curious why the additional dependency? I usually try to aim for minimal dependencies. Since this additional dev-dependency not a big issue. Just curious what is it helping with?

In regards to library Roadmap. The major things that come to mind:

Support macOS Sequoia. Hopefully, #20 completes this.
Memory and Performance improvements. This was my first large/complex Rust project. Looking back definitely some things that could probably be done differently. If you see opportunities that could make the project more Rusty or improvement memory/performance. Definitely interested in changes or ideas
Support more custom objects in logs. Currently the library supports a variety of objects/structs in the log data ([decoders])(https://github.com/mandiant/macos-UnifiedLogs/tree/main/src/decoders). However, not all are supported (ex: #10). I have been busy with other code projects and have not had time to tackle some of them

Are there any other things you think could be worth adding/changing?

Aug 25 '24 21:08 puffyCid

Hey @puffyCid, thank you for the answer

Let's answer on my side: 1 =>

my bad, I didn't RTFM, so I though the test_data was kept private, and I was wrong, so i'll remove the test_data feature I used to filter out thoses tests.
for the mac os only tests, a simple #[cfg(target_os = "macos")] will do the trick
it seems my rewrite has a lot of regressions, I did not expect so much, will debug it before you have a look

2 =>

yes, const and SCREAMING_CASE are the idiomatic way to store constant values in rust, i'm not sure is has any perf improvement in this specific case, but at least you get to know it's a compare to constant (so kinda static behavior instead or more dynamic behavior) when reading

3 =>

you're right, passing refs is a good way to share data to a function.
unless the data passed is so small that you don't really get any perf improvement not cloning it (you copy a ref, and have to deref to read it, it's slower than copying a usize).
that's why when a struct it super small, we just mark it Copy and pass it around wihout bothering about refs anymore, then the compiler can .clone() it instead of moveing it : https://doc.rust-lang.org/stable/std/marker/trait.Copy.html / https://doc.rust-lang.org/book/appendix-03-derivable-traits.html?highlight=copy#clone-and-copy-for-duplicating-values

4 =>

yes you right about avoiding dependencies, specially anyhow won't help in a library, it's more Result type used in end projects (or tests).
for libs it's recommended to implement a specifiq Error type (thiserror crate is super usefull for that, but can do it by hand)
My preference for using anyhow in tests is that I try to avoid having code that can panic, even in tests, so I avoid using .unwrap() or expect("...")
so anyhow in the tests allow me to simply use the nice ? operator.

Now about the Roadmap, I hope I can help a lot on memory & perfs. This is you first project ? You @puffyCid are Mr A. Hlcmb ?

And we had some ideas :

iter over files

having low level iterators could reduce memory consumption a lot

zero copy

We could (not sure it's possible) just have low level iterator of LogData<'lifetime_of_file_byte_array>. Such a parse would be very lightweight, check constraints but be super fast.

lazy formatting

Assuming the previous one LogData could just embed refs to what is needed to format messages and allocate copies of the data (when needed). We could allow formatting with a .message() function. This would allow us to have some use case like "scan throught the unified log looking for some message format (&str) and extract values only when match"

Perhaps I should write and issue for each of thoses, in order to get comments

Aug 26 '24 13:08 jrouaix

abandonned PR, too big, let's make it more digest for the maintainers : https://github.com/mandiant/macos-UnifiedLogs/pull/33

Nov 21 '24 15:11 jrouaix