cheshire
cheshire copied to clipboard
Always parse integers as longs
First of all, thanks for the library. I've used it extensively over many years and it always worked smoothly.
Except for today, where I was surprised to find that the default behaviour of cheshire is to convert numbers to java.lang.Integer and that blew another java interop method that was expecting a java.lang.Long.
Is there a way to set a flag so cheshire always parses numbers as Long (which I think is the less surprising behaviour given that Clojure defaults to Long)?
Thanks for the feedback. Are you talking about parsing JSON into integers, or parsing Java integers using Jackson's long functionality?
Perhaps you can show a snippet describing the issue you're seeing?
What I mean is
(class (cheshire.core/parse-string "1"))
;> java.lang.Integer
It's a surprising behaviour (at least for me). Everywhere else in Clojure Longs
are used. So if I take that result and pass it to a Java function that expects a Long it blows up in my face at runtime.
Not a bug for sure, but an interesting default choice. What is the rationale for choosing Integer instead of Long?
I think long would be better if there aren't any performance issues.
I assume large enough integers are already returned as longs?
Yes, they are:
user=> (class (decode "1234567890"))
java.lang.Integer
user=> (class (decode "12345678900"))
java.lang.Long
I believe this is something coming from Jackson, see:
JsonToken/VALUE_NUMBER_INT (.getNumberValue jp)
(VALUE_NUMBER_INT
is for all integral numbers, not only integers)
I run into this from time to time as well. It looks like a related jackson package has a way to force longs:
http://stackoverflow.com/questions/7316689/forcing-jackson-to-deserialize-to-specific-primitive-type
https://github.com/FasterXML/jackson-databind/issues/504
I don't think Cheshire is using jackson-databind though. I'm not really up to speed on what the difference between jackson-core and jackson-databind is even. It doesn't look like the core jackson JSON parser has any such option unfortunately.
This is probably "bad practice", but you can hijack parse*
like:
(alter-var-root (var cheshire.parse/parse*)
(fn [f]
(fn [^JsonParser jp key-fn bd? array-coerce-fn]
(if (= JsonToken/VALUE_NUMBER_INT (.getCurrentToken jp))
(let [nv (.getNumberValue jp)]
; cant do (.getLongValue jp) because of BigInteger possibility (and number out of range exception)
(if (= java.lang.Integer (type nv))
;; Force longs if Integer.
(long nv)
nv))
(f jp key-fn bd? array-coerce-fn)))))
I'm not sure if this is actually better than littering code with explicit (long) casts.
For context, this is where I normally get bit:
(.equals 31337 (json/parse-string "31337"))
;=> false
I'm not calling .equals
, but some Java interop code I use is, for example.
Because in Clojure:
(= 31337 (json/parse-string "31337"))
;=> true
And:
(.equals 31337 (long (json/parse-string "31337")))
;=> true
Cheshire doesn't use jackson-databind, which is a tool for serializing plain Java objects into JSON and back, it just uses the core.
I don't currently know of a way to do this with Jackson, and I'm not sure we want to coerce every number type to long necessarily..
It might be helpful to have a way to override how each JSON type is parsed. The *use-bigdecimals?*
option is a crude way of doing this specifically for floats. However, I would like something like this for integers too. In my specific use case, I want to prevent ever getting BigInteger
when an integer is encountered.
Possibly one could pass in a map with the JsonToken
types that one wants to handle specially, which parse*
then uses for a lookup? If there's no override, it could fall back to the default. The *use-bigdecimals?*
flag could then be dropped in favor of this more generic feature.
The *use-bigdecimals?*
isn't a Cheshire thing so much as it's a Jackson thing, so I don't think I'd want to drop it for a more generic version (since I assume Jackson has optimizations in place for it). I'm wondering how much performance overhead a generic option would add.
The
*use-bigdecimals?*
isn't a Cheshire thing so much as it's a Jackson thing, so I don't think I'd want to drop it for a more generic version (since I assume Jackson has optimizations in place for it). I'm wondering how much performance overhead a generic option would add.
In the code of parse.clj
, on line 69 I clearly see a conditional that's done in Clojure. It's not passed in to any of the Jackson methods AFAICT.
You're right, I was incorrectly thinking the factory options instead of the Cheshire ones. In that case, as long as it doesn't affect performance detrimentally I think it'd be reasonable to add an option to do that.