zed icon indicating copy to clipboard operation
zed copied to clipboard

Inconsistent errors when reading NDJSON with bad last line

Open philrz opened this issue 2 years ago • 1 comments

Repro is with Zed commit d599839.

The attached NDJSON test data files lines-9.ndjson.gz and lines-10.ndjson.gz both consist of several lines of valid NDJSON and a closing incomplete line:

{"syntaxerror

They otherwise only differ in that lines-10 contains one more valid NDJSON record than lines-9 before that bad last line.

Reading both with zq, the reported errors differ.

$ zq -version
Version: v1.7.0-35-gd5998393

$ zq -z lines-9.ndjson
lines-9.ndjson: parse error: string literal: unescaped line break

$ zq -z lines-10.ndjson
lines-10.ndjson: EOF

The difference becomes a little more significant when loading to a pool, since no error at all is reported for line-10.

$ zed -use foo load lines-9.ndjson
(1/1) 3724B/3724B 3724B/s 100.00%
Post "http://localhost:9867/pool/2OtB1htK17ZmmjRhXGdwCbXpoPp/branch/main": parse error: string literal: unescaped line break

$ zed -use foo load lines-10.ndjson
(1/1) 4134B/4134B 4134B/s 100.00%
2OtBn63BBXfmUMaoGYmM1nQSCrp committed

$ echo $?
0

philrz avatar Apr 24 '23 21:04 philrz

The fixes in #5055 have significantly improved the errors shown here. Repeating the original repro steps with Zed commit 38763f8, we now see:

$ zq -version
Version: v1.14.0-16-g38763f82

$ zq -z lines-9.ndjson
lines-9.ndjson: parse error: string literal: unescaped line break

$ zq -z lines-10.ndjson
lines-10.ndjson: unexpected end of JSON input

@mattnibs explains in https://github.com/brimdata/zed/pull/5055#issuecomment-1971566624 why we saw the improvement here for lines-10.ndjson but not lines-9.ndjson.

this is a separate issue since in the example of lines-9.ndjson zq is choosing the zsonio reader which is where the error is coming from. [...] if I run it with the json reader specified [...] I get the expected error message.

Indeed this is the case.

$ zq -i json lines-9.ndjson
lines-9.ndjson: unexpected end of JSON input

The improvements are similar for zed load.

$ zed load -use foo lines-9.ndjson 
(1/1) 3724B/3724B 3724B/s 100.00%
Post "http://localhost:9867/pool/2deaE3OuVrzHswcL5ja9MfZV6s1/branch/main": parse error: string literal: unescaped line break

$ zed load -use foo -i json lines-9.ndjson 
(1/1) 3724B/3724B 3724B/s 100.00%
Post "http://localhost:9867/pool/2deaE3OuVrzHswcL5ja9MfZV6s1/branch/main": unexpected end of JSON input

$ zed load -use foo lines-10.ndjson 
(1/1) 4134B/4134B 4134B/s 100.00%
Post "http://localhost:9867/pool/2deaE3OuVrzHswcL5ja9MfZV6s1/branch/main": unexpected end of JSON input

Since auto-detect is likely to be where most users start from, I'll hold this issue open in hopes we can one day do something about the part of this that @mattnibs attributes to the zsonio reader.

philrz avatar Mar 13 '24 23:03 philrz