jsonb-api icon indicating copy to clipboard operation
jsonb-api copied to clipboard

Revise big number support (Section 3.16 of JSON-B spec)

Open struberg opened this issue 7 years ago • 11 comments
trafficstars

Hi!

We got an interesting case from a user over in Apache Johnzon. https://issues.apache.org/jira/browse/JOHNZON-177

They basically have a POJO which has an int field. But the sent JSON is actually longer than Integer.MAX_VALUE. Currently Johnzon kind of silently ignores this and just cuts off the value, which leads to a totally different number than sent. The same would happen for an Integer field of course.

Note that ijson rules also do not apply as the number which get sent is smaller than the maximal possible JsonNumber (basically Long.MAX_VALUE iirc).

Shouldn't this yield an Exception? And which one? Or should we kind of enhance the ijson rules and only blow up if strict-ijson is activated?

struberg avatar Jul 06 '18 10:07 struberg

From the spec:

Deserialization of a JSON value into java.lang.Byte, Short, Integer, Long, Float 
or Double instance (or their corresponding primitive types) MUST follow 
the conversion process defined in the javadoc specification for the corresponding 
parse$Type method, such as java.lang.Byte.parseByte() for Byte.

Based of above, I suppose that an Exception must be thrown because Integer.parseInt will throw an Exception in this case. The spec doesn't specify what kind of exception should it be, but it must be a subclass of JsonbException.

Yasson does it this way:

JsonbBuilder.create().fromJson("2147483648", Integer.class);

produces:

Exception in thread "main" javax.json.bind.JsonbException: Error deserialize JSON value into type: class java.lang.Integer.

m0mus avatar Jul 16 '18 10:07 m0mus

I think we can close this ticket. The current spec wording is sufficient.

struberg avatar Jul 19 '18 19:07 struberg

Yikes, implemented it and we blew up with a big bang :(

See my comment in JOHNZON-177 https://issues.apache.org/jira/browse/JOHNZON-177?focusedCommentId=16551042&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16551042

getting a byte[] in the form of [1,5,8,255]is perfectly valid from a JSON perspective. And this is what most JSON libraries will send along the wire. Sadly 255 is not a valid value for Byte.parseByte(String). It blows up with a NumberFormatException.

Note that a [1,5,8,-1] JSON would work and result in the desired output. Sadly that's not what most JSON libs in other languages send. Simply because they distinct between signd and unsigned values. Which Java doesn't.

So what could we do? We should imo relax the restriction allow to use the full binary range. That means for a Byte the valid range would be from "-128" to "255".

While this is imo almost a no-brainer for Byte it's a bit harder for bigger number types like Short, Integer and Long. Should we also relax those?

Let's focus on Short for example The valid range according to the current JSON-B spec is Short.MIN_VAL (-32768) to Short.MAX_VAL (+32767). A number of 65535 is currently illegal. Is this ok? Or do we automatically convert it to -1?

A possible solution might be to introduce an Annotation like @Unsigned which allows such extended conversion. Or we invert the logic and use a new @Signed to enforce the strict Type range.

struberg avatar Jul 20 '18 18:07 struberg

What about using bigger capacity Java type in this case?

m0mus avatar Jul 20 '18 18:07 m0mus

Copying over from our Twitter discussion:

Well, for byte[] you cannot really use that much another type, isn't? Yes, you could use a short[] and then go over and copy it short by short over to a byte[]. Would totally kill any performance. It's just that Java doesn't respect anything unsigned. Which exists elsewhere.

struberg avatar Jul 20 '18 20:07 struberg

Think we should not close that too fast since it breaks the binary users and they are really numerous. Being aligned on java parsing is nice for a lot of users but others should be able to use "binary parsing". For the default I'm not certain current one is the best default (and it would be fine to change it since it wouldn't break apps). In any case we must have an API to solve that. I think it is a general number concern and potentially string as well. Where I want to go with this sentence is that we could have a new data holder type instead of tunning the way we parse with a new annotation:

private Binary<byte[]> matrix;

The implementation would handle this type specifically which would prevent the failure due to the parse being called at mapping time (it would be lazy). The Binary interface would be:

public interface Binary<T> {
    /** @returns the raw underlying value */
    T get();
    /** validates the underlying value according the T rules */
    Binary<T> validate();
    /** returns the binary representation of the value */
    byte[] asBytes();
}

rmannibucau avatar Jul 22 '18 19:07 rmannibucau

Hmm something like @TreatAsBinary? where we just take the bitmap and apply it?

That means if we would have a short val and get {"val":65534} then we get this as value -2into our val. But of course the bitmap would remain the same 0xFFFE.

Not quite sure if this is needed, since almost everywhere we have the POJO in our hands we could also switch to a larger java type + a BeanValidation rule. Because if you have an unsigned short then the valid range is 0..65535. But switching to int will allow for much bigger values. Otoh auto-interpreting the number as signed will also lead to weird situations as 0xFFFE < 0x0002 when doing a signed comparison.

I think the only real problem is byte[]. This often really just gets passed through as the bits they are without any interpretation. And having to use a short[] in your POJO and then doing a conversion is just meeehhh :/

struberg avatar Jul 23 '18 06:07 struberg

@JsonbBinary (to respect the api naming convention) would work but if we use that Id like it to be for any possible type (all numbers and pby charsequence + their list/array/map flavors). For info, I saw this kind of thing used for all primitives but long numbers.

rmannibucau avatar Jul 23 '18 06:07 rmannibucau

In rewriting some C code to control devices on a Raspberry Pi I came across the literal 0xA5 or 165 decimal. It was an unsigned byte so my code had to look for -91. Is there a technical reason that there are no unsigned types in Java. Did an early Java language developer have a bad experience as a child and so banished unsigned from the language? I'll have to corner Brian G. at CodeOne and ask him.

omniprof avatar Oct 03 '18 16:10 omniprof

Hi @omniprof! The argument back then was that Java is designed to be as simple as possible. So they ripped every semantic redundancy out of the language in the beginning. Of course in the meantime many of those got re-introduced again ;) With signed vs unsigned it was the same. Both result in the same bit mapping. And it only differs if you interpret the values. Which is sadly exactly what happens in our very case.

struberg avatar Jan 07 '19 21:01 struberg

adding a link to this relevant issue also: https://github.com/eclipse-ee4j/jsonb-api/issues/112

aguibert avatar Apr 08 '20 00:04 aguibert