Tim Wojtulewicz comments

Results 288 comments of


                                            Tim Wojtulewicz

sessions for non-analyzed protocols should still be tracked with a conn entry

Would it be alright to just move all of it to the policy? Instead of having a ID field in conn.log and a policy that adds a name field, the...

Discrepancy between JSON and TSV logs for non-printable bytes

> Maybe there was prior discussion that that's not a model appropriate for Zeek logs that I'm unaware of. @timwoj ? I don't recall any prior discussion. We likely just...

Discrepancy between JSON and TSV logs for non-printable bytes

I took a few minutes to look at this. Switching over to use `\u` notation for valid characters isn't hard. Rapidjson does it automatically. The question is what to do...

Discrepancy between JSON and TSV logs for non-printable bytes

I rather like Python's `surrogateescape` approach. It's simple and keeps the byte data consistent and the unicode characters actually valid. I think the problem here is there isn't a standard...

Discrepancy between JSON and TSV logs for non-printable bytes

> I'm definitely in the overthinking corner at this point, however. I could see surrogateescape as an option, but might have slight preference to the \u00xx approach assuming we do...

Discrepancy between JSON and TSV logs for non-printable bytes

I think my only concern in this is that we'll take some random string with binary data in it (like the ciphertext from kerberos) and convert some part of it...

Discrepancy between JSON and TSV logs for non-printable bytes

I think the final plan from the above is: - Convert valid UTF-8 sequences to unicode code points and encode them as `\u0000` format - Leave other invalid sequences as...

Discrepancy between JSON and TSV logs for non-printable bytes

Alright, here's another proposal after @ckreibich and I chatted a bit: - Invalid utf-8 characters are encoded as surrogate-escaped unicode codepoints (`\uDCxx`) - Valid utf-8 characters are converted to unicode...

Discrepancy between JSON and TSV logs for non-printable bytes

> For control-characters, [the JSON RFC](https://www.ietf.org/rfc/rfc4627.html#section-2.5) specifies using \u0001 - \u001F, so we should minimally do that always (rather than surrogate escape these). That would actually fix OPs concrete problem....

Clean up/fix static analysis findings

@bbannier pointed me at the set of clang-tidy checkers that Spicy uses: https://github.com/zeek/spicy/blob/main/.clang-tidy