Tim Wojtulewicz

Results 288 comments of Tim Wojtulewicz

Would it be alright to just move all of it to the policy? Instead of having a ID field in conn.log and a policy that adds a name field, the...

> Maybe there was prior discussion that that's not a model appropriate for Zeek logs that I'm unaware of. @timwoj ? I don't recall any prior discussion. We likely just...

I took a few minutes to look at this. Switching over to use `\u` notation for valid characters isn't hard. Rapidjson does it automatically. The question is what to do...

I rather like Python's `surrogateescape` approach. It's simple and keeps the byte data consistent and the unicode characters actually valid. I think the problem here is there isn't a standard...

> I'm definitely in the overthinking corner at this point, however. I could see surrogateescape as an option, but might have slight preference to the \u00xx approach assuming we do...

I think my only concern in this is that we'll take some random string with binary data in it (like the ciphertext from kerberos) and convert some part of it...

I think the final plan from the above is: - Convert valid UTF-8 sequences to unicode code points and encode them as `\u0000` format - Leave other invalid sequences as...

Alright, here's another proposal after @ckreibich and I chatted a bit: - Invalid utf-8 characters are encoded as surrogate-escaped unicode codepoints (`\uDCxx`) - Valid utf-8 characters are converted to unicode...

> For control-characters, [the JSON RFC](https://www.ietf.org/rfc/rfc4627.html#section-2.5) specifies using \u0001 - \u001F, so we should minimally do that always (rather than surrogate escape these). That would actually fix OPs concrete problem....

@bbannier pointed me at the set of clang-tidy checkers that Spicy uses: https://github.com/zeek/spicy/blob/main/.clang-tidy