zed
zed copied to clipboard
search_replace function
tl;dr
A community zync user had created a switch
-based Zed program for a large search-and-replace task in their data transformation pipeline. They asked for some assistance in simplifying it for maintainability. We came up with some improvements using existing Zed building blocks, but in the end @mccanne had the following thought for something new we could write to nail this directly:
What if we used capture groups in Go's regexp library to create a
search_replace
function? You could give it the map and we would translate the patterns to capture groups then use the capture group index to select the replacement string.
Details
At the time this issue is being filed, Zed it at commit 9766d17.
The original question from the community zync user was posed as:
do you know if there is way to declare a const with a regular expression ? ex:
const myregexp=/myr.+n/1
Indeed, this is not currently possible in Zed. There's other examples of places where a user might expect to be able to use regexps but can't (#4917).
To illustrate the use case, the user shared their program, but it's confidential and can't be pasted here. However, here's a simplified program switch-grep.zed
that uses their approach:
switch (
case (grep(/foo|bar/, desc)) => category := "The Foobar category"
case (grep(/a.*\-z.*/, desc)) => category := "The A-to-Z category"
default => category := "The default category"
)
With input data descriptions.zson
:
{"desc": "It's foo time"}
{"desc": "a is the first letter-z is last"}
{"desc": "Something else"}
Running it:
$ zq -version
Version: v1.17.0-1-g9766d17d
$ zq -I switch-grep.zed descriptions.zson
{desc:"It's foo time",category:"The Foobar category"}
{desc:"Something else",category:"The default category"}
{desc:"a is the first letter-z is last",category:"The A-to-Z category"}
The user's program actually had about 70 case
statements.
In addition to its sheer size, a couple other challenges with maintainability are evident here:
- There's a lot of code repeated in support of the actual strings that form the search/replace pairs (i.e.,
case (grep(..., desc)) => category := ...
- Per the user's point, if the regexps could be defined as
const
, they could be more easily re-used in other contexts (e.g., define them in a separate file that's included withzq -I ...
to be invoked in many programs)
After a little hacking, I found this could be improved to this switch-regexp.zed
:
const changes = |{
"foo|bar": "The Foobar category",
"a.*\\-z.*": "The A-to-Z category"
}|
category := coalesce((over changes with desc | switch (case regexp(key, desc) != null => yield value | head 1)), "The default category")
Running it gives the same output we saw before.
$ zq -I switch-regexp.zed descriptions.zson
{desc:"It's foo time",category:"The Foobar category"}
{desc:"a is the first letter-z is last",category:"The A-to-Z category"}
{desc:"Something else",category:"The default category"}
Things to highlight:
- By using
over
we're able to avoid the repeat of all thecase
clauses - By using the
regexp
function instead ofgrep
we're able to leverage the former's unique ability to take a regular expression that's defined as a string, and strings can be defined viaconst
. However, this does come with one caveat: Some escape sequences that aren't valid for a string but are for regex (such as the\-
in the second pattern) now need a double backslash.
The user was satisfied with this improvement. However, having watched this unfold, @mccanne had the idea quoted above for a purpose-built search_replace
function that would allow the user to avoid needing to know or look up the coalesce
/over
/switch
/regexp
combination shown here. If regexps in their / /
form also became first class concepts at the same time that would surely be convenient to the user as well since they'd be able to avoid the "double backslash" overhead when creating the map (and hence more easily re-use regexps from other tools without modification), but this seems orthogonal.