Inconsistent errors when reading NDJSON with bad last line
Repro is with Zed commit d599839.
The attached NDJSON test data files lines-9.ndjson.gz and lines-10.ndjson.gz both consist of several lines of valid NDJSON and a closing incomplete line:
{"syntaxerror
They otherwise only differ in that lines-10 contains one more valid NDJSON record than lines-9 before that bad last line.
Reading both with zq, the reported errors differ.
$ zq -version
Version: v1.7.0-35-gd5998393
$ zq -z lines-9.ndjson
lines-9.ndjson: parse error: string literal: unescaped line break
$ zq -z lines-10.ndjson
lines-10.ndjson: EOF
The difference becomes a little more significant when loading to a pool, since no error at all is reported for line-10.
$ zed -use foo load lines-9.ndjson
(1/1) 3724B/3724B 3724B/s 100.00%
Post "http://localhost:9867/pool/2OtB1htK17ZmmjRhXGdwCbXpoPp/branch/main": parse error: string literal: unescaped line break
$ zed -use foo load lines-10.ndjson
(1/1) 4134B/4134B 4134B/s 100.00%
2OtBn63BBXfmUMaoGYmM1nQSCrp committed
$ echo $?
0
The fixes in #5055 have significantly improved the errors shown here. Repeating the original repro steps with Zed commit 38763f8, we now see:
$ zq -version
Version: v1.14.0-16-g38763f82
$ zq -z lines-9.ndjson
lines-9.ndjson: parse error: string literal: unescaped line break
$ zq -z lines-10.ndjson
lines-10.ndjson: unexpected end of JSON input
@mattnibs explains in https://github.com/brimdata/zed/pull/5055#issuecomment-1971566624 why we saw the improvement here for lines-10.ndjson but not lines-9.ndjson.
this is a separate issue since in the example of
lines-9.ndjsonzq is choosing the zsonio reader which is where the error is coming from. [...] if I run it with the json reader specified [...] I get the expected error message.
Indeed this is the case.
$ zq -i json lines-9.ndjson
lines-9.ndjson: unexpected end of JSON input
The improvements are similar for zed load.
$ zed load -use foo lines-9.ndjson
(1/1) 3724B/3724B 3724B/s 100.00%
Post "http://localhost:9867/pool/2deaE3OuVrzHswcL5ja9MfZV6s1/branch/main": parse error: string literal: unescaped line break
$ zed load -use foo -i json lines-9.ndjson
(1/1) 3724B/3724B 3724B/s 100.00%
Post "http://localhost:9867/pool/2deaE3OuVrzHswcL5ja9MfZV6s1/branch/main": unexpected end of JSON input
$ zed load -use foo lines-10.ndjson
(1/1) 4134B/4134B 4134B/s 100.00%
Post "http://localhost:9867/pool/2deaE3OuVrzHswcL5ja9MfZV6s1/branch/main": unexpected end of JSON input
Since auto-detect is likely to be where most users start from, I'll hold this issue open in hopes we can one day do something about the part of this that @mattnibs attributes to the zsonio reader.