zed icon indicating copy to clipboard operation
zed copied to clipboard

Zeek reader "(empty)" handling on string data type

Open philrz opened this issue 4 years ago • 0 comments

Based on black box testing, zq (commit a7522c2 at the moment) currently only seems to treat the #empty_field (empty) as indicative of empty set & vector types. But it looks like it should be doing the same for string types as well. I originally stumbled onto this while reading in Zeek TSV and shaped Zeek NDJSON and comparing the ZSON-format output, and this showed up in the rdp event types of the zq-sample-data. But here's proof in the form of a simple Zeek script that outputs an empty string "":

$ cat mine.zeek 
module Mine;

export {
    redef enum Log::ID += { LOG };

    type Info: record {
        my_str:           string &log;
        };

    }

event zeek_init()
    {
    Log::create_stream(Mine::LOG, [$columns=Mine::Info, $path="mine"]);

    Log::write( Mine::LOG, [$my_str=""]);
    }

Run with Zeek v4.0.0, we can see that in the Zeek TZV log it does show up as (empty), which then gets read in by zq as that actual string rather than turning it back into an empty string:

$ /usr/local/zeek-4.0.0/bin/zeek local mine.zeek
WARNING: No Site::local_nets have been defined.  It's usually a good idea to define your local networks.

$ cat mine.log 
#separator \x09
#set_separator	,
#empty_field	(empty)
#unset_field	-
#path	mine
#open	2021-03-17-11-31-40
#fields	my_str
#types	string
(empty)
#close	2021-03-17-11-31-40

$ zq -version
Version: v0.29.0-132-ga7522c26

$ zq -z mine.log 
{_path:"mine",my_str:"(empty)" (bstring)} (=0)

Repeating the same with the Zeek NDJSON log, they render it as the empty string so there's no problem:

$ /usr/local/zeek-4.0.0/bin/zeek local "LogAscii::use_json=T" mine.zeek
WARNING: No Site::local_nets have been defined.  It's usually a good idea to define your local networks.

$ cat mine.log 
{"my_str":""}

$ zq -z mine.log 
{my_str:""}

This can be filed under "You learn something new every day!" For the past couple years of staring at Zeek logs, I thought I'd only ever seen it used in contexts where (empty) was describing an empty set or vector. Indeed, here was me and @henridf agreeing with each other on this observation in an internal chat:

Somehow despite all the variations I've tested, I guess I never actually checked that one! :facepalm:

philrz avatar Mar 17 '21 18:03 philrz