jq icon indicating copy to clipboard operation
jq copied to clipboard

Null bytes are handled inconsistently

Open SOF3 opened this issue 1 year ago • 5 comments
trafficstars

Describe the bug A clear and concise description of what the bug is.

Whitespace-delimited NUL bytes are sometimes parsed as zero values but sometimes not.

To Reproduce Provide a minimal test case to reproduce the behavior. If the input is large, either attach it as a file, or create a gist and link to it here.

$ for cmd in xxd jq; do printf '1\r\x00\n\x00\n1\n\x00 \x00' | $cmd; done
00000000: 310d 000a 000a 310a 0020 00              1.....1.. .
1
0
0
1

(Btw, U+000D is a valid whitespace character according to RFC 8259, but does not seem to be included in the lexer. I am not familiar with flex so I don't know if there's some magic going on there)

https://github.com/jqlang/jq/blob/ed8f7154f4e3e0a8b01e6778de2633aabbb623f8/src/lexer.l#L133

Expected behavior A clear and concise description of what you expected to happen.

To be honest, I don't know what to expect for null bytes, but I would expect them to be something more consistent.

RFC 8259 does not permit NUL bytes as input, so it is reasonable (although probably unnecessary) to treat them, when outside string literals, either as invalid characters or whitespace. But magically creating a Number(0) value does not look right.

Environment (please complete the following information):

  • OS and Version: [e.g. macOS, Windows, Linux (please specify distro)]
  • jq version [e.g. 1.5]
$ jq --version
jq-1.6

Additional context Add any other context about the problem here.

SOF3 avatar May 04 '24 05:05 SOF3

Meanwhile, \x22\x00\x22 (" <NUL> ") reports the following error, which appears to suggest that null bytes in general should not be allowed:

parse error: Unfinished string at EOF at line 1, column 1

SOF3 avatar May 04 '24 05:05 SOF3

src/lexer.l is the jq lexer; not the json lexer

emanuele6 avatar May 04 '24 08:05 emanuele6

jq 1.6 is an old version; I tried your example and I get a parse error:

$ printf '1\r\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 2, column 0

So, if NUL is supposed to be whitespace as you are saying (have not checked), that is wrong; but it does not return 0 for the NULs.

emanuele6 avatar May 04 '24 08:05 emanuele6

Meanwhile, \x22\x00\x22 (" <NUL> ") reports the following error, which appears to suggest that null bytes in general should not be allowed:

parse error: Unfinished string at EOF at line 1, column 1

@SOF3 That is just standard JSON as specified in https://json.org

You cannot have literal ASCII control characters (with the exception of DEL U+007f; mentioned in the rfc) in JSON strings.

emanuele6 avatar May 04 '24 08:05 emanuele6

But the parser does seem to get confused by NUL when it is used as whitespace in the input:

$ printf '1\0 2 ' | jq      # stops parsing after NUL
1
$ printf '1\0 2\n' | jq     # treats NUL as whitespace
1
2
$ printf '1\r\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 2, column 0
$ printf '1\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 3, column 0
$ printf '1\x00\x00\n1\n\x00 \x00' | jq
1
1

emanuele6 avatar May 04 '24 08:05 emanuele6