Add a fault tolerance option for multi-line json parsing
Before opening a new issue, please make sure you've reviewed the troubleshooting guide: https://github.com/brimdata/brim/wiki/Troubleshooting#opening-an-issue
Is your feature request related to a problem? Please describe. Add a fault tolerance option for multi-line json parsing.
Throw a warning when one line of a multi-line json is parsed incorrectly, skip the line that cannot be parsed and continue parsing. Instead of throwing a warning and stopping the parsing.
Describe the solution you'd like Skip the line that cannot be parsed and continue parsing.
This one has been transferred to the Zed repo since that's where an enhancement would need to first exist. For instance, using Zed commit 28f95a3 and the threelines.ndjson.gz test as was used in https://github.com/brimdata/brim/issues/2554:
$ gzcat threelines.ndjson.gz
{"one": 1}
{"two : 2}
{"three": 3}
$ zq -version
Version: v1.2.0-64-g28f95a3c
$ zq threelines.ndjson.gz
threelines.ndjson.gz: parse error: string literal: unescaped line break
Once a mode exists to handle this at the Zed layer, an enhancement may also be required at the app layer, as it would likely not be functionality we'd enable by default and therefore requires one or more flags (and hence the user would need to toggle the same flag from within the app). For instance, a flag seems required to specify that the input is expected to be newline-delimited JSON, and another flag for the user to indicate that they want to enable this tolerance mode rather than halting input when bad data is encountered.
It should also be noted that this is not a limitation unique to Zed tooling. For instance, the jq tool that's very common for multi-line JSON processing also fails.
$ gzcat threelines.ndjson.gz | jq .
{
"one": 1
}
parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 3, column 2
@burpheart: I wanted to make you aware that we've come up with an idea to address this at the Zed layer that's now tracked in #4546. Another user had reported an issue similar to yours and I proposed some interim workarounds in https://github.com/brimdata/zui/issues/2756#issuecomment-1522290280 that might also work for your case. I'm going to go ahead and close this particular issue so please feel free to keep an eye on #4546 and see how you like it when the enhancement lands.