draft-ietf-jsonpath-base icon indicating copy to clipboard operation
draft-ietf-jsonpath-base copied to clipboard

inconsistent treatment of negative 0

Open gongfarmer opened this issue 1 year ago • 8 comments

Negative zero in an index selector is invalid:

index-selector      = int                        ; decimal integer

int                 = "0" /
                      (["-"] DIGIT1 *DIGIT)      ; - optional
DIGIT1              = %x31-39                    ; 1-9 non-zero digit

Negative zero in a number within a comparable is explicitly allowed:

number              = (int / "-0") [ frac ] [ exp ] ; decimal number

Practically, this inconsistency makes the parsing code more complicated. The lexer can't just raise an error when the lexer sees "-0", because it might be OK. So, this state has to be preserved until later on in the parsing stage when it can be determined from the abstract syntax tree if this is the "allowed" one or not.

Personally I would prefer to just allow negative zero. It's harmless, easy to parse and most programming languages handle it silently. If you are writing a parser that is good enough to detect this error, it is just as easy to treat it as positive zero and move on.

gongfarmer avatar Jan 09 '25 19:01 gongfarmer

Thank you for mentioning this implementation complication. In an index position, negative numbers mean something: index from the right (reference = the first element that isn't there). -0 would really mean you want to go to that first element after the array. I think the confusion that becomes visible from expressing something like this is bad enough to warrant a little additional checking.

cabo avatar Jan 09 '25 19:01 cabo

I have implemented the RFC :)

He-Pin avatar Jan 09 '25 19:01 He-Pin

I got bit by this same issue when testing against the cts.json compliance file. Using the regex definition of Number given in the RFC, -0 will scan as a number literal. But not an int literal. In a typical scanner you would just create a NumberLiteral token for this value, and disambiguate it later as an int or float. Since floats cannot be index nor slice arguments, they should be rejected at the parsing stage. I had to refactor how I matched this in the lexer. If the pattern matches a number, I see if there is either a fractional part or an exponent part. If not, then it can't be a float, so I try to match an int. If it's a -0, it will fail to match either float or int and it produces a syntax error in the lexer.

Edit: Oh, I forgot that this change I made introduced the same bug you mention about comparable. I now fail the query "'$[[email protected]==-0]" , as my scanner gives me a syntax error for -0 here.

I think the root issue is that in context, the -0 in an index or slice argument is an int, which is not allowed. But the -0 in a comparable could either be an int or a float. A -0 int would be invalid but a -0 float would be ok. I don't know how you tweak the grammar for this. Maybe you just need to implement some semantic logic at the point of reference to enforce these rules.

rob-ross avatar Jun 11 '25 21:06 rob-ross

Well, this issue has been a pet peeve of mine for a while so I did a little more research into it.

The JSON-Path spec is just implementing the JSON spec ( RFC 8259 ) regarding numbers and ints.

Specifically, on page 7-8 of the JSON spec:

number = [ minus ] int [ frac ] [ exp ]

...

int = zero / ( digit1-9 *DIGIT )

So changing this behavior in the JSON-Path spec would make that change inconsistent with the JSON spec.

I would argue that we would need to get this change made in the JSON spec before changing the JSON-Path spec. I don't know how difficult that would be. Someone would have to do a detailed analysis on "why" this change is important first, and how it affects existing codebases. I naively think that this is just expanding an existing feature, rather than removing one, constraining one, or adding a new one. So this would be one of the simpler spec changes to make. But that's a naive opinion.

As I wrote above, the issue is that although in many programming languages you can write -0, if that literal is interpreted as an int, you lose the negative context of the original literal. Ints don't have a concept of positive zero vs negative zero. (Maybe they do in math in general (?), but not on digital computers.) So the "value" of an int literal written as '-0' will just be '0'.

Floats however can remember that the literal is -0 and not just 0. JavaScript itself will parse -0 as a float literal, preserving the negative sign for subsequent operations on the float.

So the problem with this in the JSON-Path spec is not so much using -0 as an index or slice value. That seems harmless enough in isolation. The problem is that a -0 is a float, not an int, and you can't use a float as an array or slice index. But you can use a float in this context for comparison:

$[[email protected]==-0]   # perfectly valid float comparison

So, I have come full circle on this issue and now believe that no changes should be made to the spec, and we all just have to add a few more guards in our code to reject -0 ints.

rob-ross avatar Jun 12 '25 22:06 rob-ross

So changing this behavior in the JSON-Path spec would make that change inconsistent with the JSON spec.

I would argue that JSON only defines numbers, and the int declaration is a portion of a number, not a full value unto itself.

With this in mind, I don't think that we necessarily need to change JSON first. We're defining "integer" in addition to what JSON defines.

gregsdennis avatar Jun 13 '25 05:06 gregsdennis

So the problem with this in the JSON-Path spec is not so much using -0 as an index or slice value.

That is not a bug, but a feature. What do you think [-0] is supposed to mean? I do not understand why some of you seem to want to extend the syntax to allow non-sensical indexes/slice arguments.

cabo avatar Jun 13 '25 08:06 cabo

Just as a reminder: The ABNF rule int is defined as

int                 = "0" /
                      (["-"] DIGIT1 *DIGIT)      ; - optional
DIGIT1              = %x31-39                    ; 1-9 non-zero digit

... and is used in index and slice selectors:

index-selector      = int                        ; decimal integer

slice-selector      = [start S] ":" S [end S] [":" [S step ]]

start               = int       ; included in selection
end                 = int       ; not included in selection
step                = int       ; default: 1

These are exactly as they should be.

As a shortcut, int is also used in JSON-Path's definition of number:

number              = (int / "-0") [ frac ] [ exp ] ; decimal number

To allow IEEE 754 -0.0 (and its alternative JSON notation -0), we need to add an alternative here, and only here.

cabo avatar Jun 13 '25 08:06 cabo

... and if your scanner implementation (you don't actually need a scanner for JSONPath) wants to mark number values that cannot be int values, you have to react to all three, -0, frac, and exp.

cabo avatar Jun 13 '25 08:06 cabo