cheshire icon indicating copy to clipboard operation
cheshire copied to clipboard

Always parse integers as longs

Open frankiesardo opened this issue 8 years ago • 11 comments

First of all, thanks for the library. I've used it extensively over many years and it always worked smoothly.

Except for today, where I was surprised to find that the default behaviour of cheshire is to convert numbers to java.lang.Integer and that blew another java interop method that was expecting a java.lang.Long.

Is there a way to set a flag so cheshire always parses numbers as Long (which I think is the less surprising behaviour given that Clojure defaults to Long)?

frankiesardo avatar Oct 18 '16 12:10 frankiesardo

Thanks for the feedback. Are you talking about parsing JSON into integers, or parsing Java integers using Jackson's long functionality?

Perhaps you can show a snippet describing the issue you're seeing?

dakrone avatar Oct 20 '16 16:10 dakrone

What I mean is

(class (cheshire.core/parse-string "1"))
;> java.lang.Integer

It's a surprising behaviour (at least for me). Everywhere else in Clojure Longs are used. So if I take that result and pass it to a Java function that expects a Long it blows up in my face at runtime.

Not a bug for sure, but an interesting default choice. What is the rationale for choosing Integer instead of Long?

frankiesardo avatar Oct 20 '16 16:10 frankiesardo

I think long would be better if there aren't any performance issues.

I assume large enough integers are already returned as longs?

gfredericks avatar Oct 20 '16 16:10 gfredericks

Yes, they are:

user=> (class (decode "1234567890"))
java.lang.Integer
user=> (class (decode "12345678900"))
java.lang.Long

I believe this is something coming from Jackson, see:

    JsonToken/VALUE_NUMBER_INT (.getNumberValue jp)

(VALUE_NUMBER_INT is for all integral numbers, not only integers)

dakrone avatar Oct 20 '16 16:10 dakrone

I run into this from time to time as well. It looks like a related jackson package has a way to force longs:

http://stackoverflow.com/questions/7316689/forcing-jackson-to-deserialize-to-specific-primitive-type

https://github.com/FasterXML/jackson-databind/issues/504

I don't think Cheshire is using jackson-databind though. I'm not really up to speed on what the difference between jackson-core and jackson-databind is even. It doesn't look like the core jackson JSON parser has any such option unfortunately.

This is probably "bad practice", but you can hijack parse* like:

  (alter-var-root (var cheshire.parse/parse*)
                  (fn [f]
                    (fn [^JsonParser jp key-fn bd? array-coerce-fn]
                      (if (= JsonToken/VALUE_NUMBER_INT (.getCurrentToken jp))
                        (let [nv (.getNumberValue jp)]
                          ; cant do (.getLongValue jp) because of BigInteger possibility (and number out of range exception)
                          (if (= java.lang.Integer (type nv))
                            ;; Force longs if Integer.
                            (long nv)
                            nv))
                        (f jp key-fn bd? array-coerce-fn)))))

I'm not sure if this is actually better than littering code with explicit (long) casts.

damionjunk avatar Jan 25 '17 19:01 damionjunk

For context, this is where I normally get bit:

(.equals 31337 (json/parse-string "31337"))
;=> false

I'm not calling .equals, but some Java interop code I use is, for example.

Because in Clojure:

(= 31337 (json/parse-string "31337"))
;=> true

And:

(.equals 31337 (long (json/parse-string "31337")))
;=> true

damionjunk avatar Jan 25 '17 20:01 damionjunk

Cheshire doesn't use jackson-databind, which is a tool for serializing plain Java objects into JSON and back, it just uses the core.

I don't currently know of a way to do this with Jackson, and I'm not sure we want to coerce every number type to long necessarily..

dakrone avatar Jan 26 '17 00:01 dakrone

It might be helpful to have a way to override how each JSON type is parsed. The *use-bigdecimals?* option is a crude way of doing this specifically for floats. However, I would like something like this for integers too. In my specific use case, I want to prevent ever getting BigInteger when an integer is encountered.

Possibly one could pass in a map with the JsonToken types that one wants to handle specially, which parse* then uses for a lookup? If there's no override, it could fall back to the default. The *use-bigdecimals?* flag could then be dropped in favor of this more generic feature.

sjamaan avatar Nov 28 '23 14:11 sjamaan

The *use-bigdecimals?* isn't a Cheshire thing so much as it's a Jackson thing, so I don't think I'd want to drop it for a more generic version (since I assume Jackson has optimizations in place for it). I'm wondering how much performance overhead a generic option would add.

dakrone avatar Nov 29 '23 17:11 dakrone

The *use-bigdecimals?* isn't a Cheshire thing so much as it's a Jackson thing, so I don't think I'd want to drop it for a more generic version (since I assume Jackson has optimizations in place for it). I'm wondering how much performance overhead a generic option would add.

In the code of parse.clj, on line 69 I clearly see a conditional that's done in Clojure. It's not passed in to any of the Jackson methods AFAICT.

sjamaan avatar Nov 30 '23 09:11 sjamaan

You're right, I was incorrectly thinking the factory options instead of the Cheshire ones. In that case, as long as it doesn't affect performance detrimentally I think it'd be reasonable to add an option to do that.

dakrone avatar Nov 30 '23 16:11 dakrone