Parsing text lines into records (Grok)

Open philrz opened this issue 3 years ago • 0 comments

A community user asked in a public Slack thread:

What would the best process be to parse a field of large text (syslog events in this case) into a record so that data within the longer string can be easily searched and integrated with zed query syntax?

For now the user managed to get by using the split() function and then using record literal syntax to assign field names to the value of each element of the resulting array. However, the first thing I thought of upon hearing the inquiry was Grok, as tools like Logstash have long used this to do the heavy lifting for such parsing. Some of the benefits of Grok is that there's off-the-shelf patterns that can be invoked to parse standard log formats as well as debug tools like https://grokdebug.herokuapp.com/ that help users create their own custom parsers.

I can see that there's a Go library https://github.com/vjeantet/grok, so maybe it would not be too difficult to add a Zed function that uses it.

If we ever wanted to get fancy, it seems we could extend the Grok syntax to also assign a Zed data type to the created fields. There's some precedent here in the Grok that's used in Logstash as described in this StackOverflow post. In their case they only support destination data types for int and float, but Zed could differentiate by supporting its full set of rich types. I assume the Go library is only generating strings, but perhaps we could extend it.

Oct 10 '22 19:10 philrz