jackson-databind icon indicating copy to clipboard operation
jackson-databind copied to clipboard

Perfomance issue Integer wrapper over primitive type

Open andriewski opened this issue 1 year ago • 10 comments

Hi, sorry, to make an issue for such topic.

I used Java microbenchmark harness to make some tests That was an object with 100 of primitive int fields and an object with 100 of wrapper fields

I did really expect to see almost the same difference, but in fact Integer wrapper works around 50% faster than primitive type. In my case

Integer -> 124900 ops/s int -> 79991 ops/s

Why I think that might be a real issue: In next Java versions Valhalla project might be released (see https://blogs.oracle.com/javamagazine/post/java-jdk-18-evolution-valhalla-panama-loom-amber) and I have some concerns that migration to primitive types might be a downgrade for your app.

Or we might have an extra level like an adapter between deserialization level and business model.

JMH results are here repo https://github.com/andriewski/primitive-vs-wrapper

Maybe there is some tunings and I'm just not aware of them I hoped DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES might increase perfomance but it didn't.

Feel free to give any suggestions.

andriewski avatar Jul 11 '22 18:07 andriewski

Ok so just to make sure I understand: if you have, say:

public class PointWithPrimitives {
  public int x, y;
}

pubic class PointWithWrappers {
  public Integer x, y;
}

then deserializing latter would be measurably faster than former? That seems peculiar as I'd expect both to be roughly similar in performance -- because wrappers are indeed used internally when assigned via Reflection APIs.

One thing that could definitely help is use of either Afterburner or Blackbird modules: they can avoid boxing/unboxing (see https://github.com/FasterXML/jackson-modules-base).

But as to core databind, I am not sure how performance difference could be addressed. Jackson does not have much control over how Reflection passes values. Although if there are benchmarks perhaps one could play with various changes on primitive value deserialization -- deserialization for these are rather simple -- and see anything helps avoid some unnecessary boxing+unboxing.

I assume testing was done using Jackson 2.13.3.

cowtowncoder avatar Jul 11 '22 22:07 cowtowncoder

@cowtowncoder using methodhandle you can set fields / call methods using primitives without boxing. Not sure if it would make much difference here though.

yawkat avatar Jul 12 '22 13:07 yawkat

@yawkat Good to know. Probably won't be usable for Jackson 2.x (if I recall correctly, MethodHandle approach was not measurably faster with Java 8, as per someone's investigation) but could be for 3.x once we raise JDK baseline. (although I guess maybe someone running on Java 11+ could benefit even with 2.x).

But yeah, would be good to know if there are other workarounds. A little bit of boxing/unboxing really should not have drastic difference I think; there's other overhead involved in reflection access.

cowtowncoder avatar Jul 12 '22 16:07 cowtowncoder

@cowtowncoder thank you for your reply

yea, you're right testing was made using version 2.13.3.

For testing I used classes

class LotsOPrimitives {
    private final int i1;
    ...
    private final int i100;

    @java.beans.ConstructorProperties({"i1", ..., "i100"})
    public LotsOPrimitives(int i1 ..., int i100) {
        this.i1 = i1;
        ...
        this.i100 = i100;
    }
}

and

class LotsOfWrappers {
    private final Integer i1;
    ...
    private final Integeri 100;

    @java.beans.ConstructorProperties({"i1", ..., "i100"}) 
    public LotsOfWrappers(Integer i1 ..., Integer i100) {
        this.i1 = i1;
        ...
        this.i100 = i100;
    }
}

So, each class has 100 of fields (of course just for testing purpose)

Sorry, for not telling that important fact - deserializtion is made not by using default constuctors + setters, but using allArgs constructor.

I guess, that quite common apporach, since Java world is moving toward immutability

I'll make some tests with

  • Blackbird
  • Afterburner and tell you about what has changed

P.S. just checked with Blackbird - it seams like it completely solved the issue, but I'll run jmh benchmarks

  • vanilla
  • blackbird
  • afterburner tomorrow just to be sure and post here the results

andriewski avatar Jul 12 '22 18:07 andriewski

Ah. The difference here likely is the use of Constructors I think... compared to tests I use (although I have not compared primitives/wrappers specifically). It'd be interesting to see if use of setters/fields would remove difference.

I agree that due to benefits of Immutability, Constructor-passing is a very important use case and should ideally be optimized.

Constructor use is, btw, not necessarily as well optimized at least by Afterburner; not sure about Blackbird. In the end, use of fields or setters is likely faster no matter what (for various reasons). But should not be big enough to necessarily matter for most use cases. And specifically difference between primitives and their wrappers should be negligible.

cowtowncoder avatar Jul 13 '22 01:07 cowtowncoder

@cowtowncoder It took 10h 32m just to run it :) Full results are here https://github.com/andriewski/primitive-vs-wrapper/blob/master/src/jmh/java/by/mark/primitivevswrapper/objectmapper/results.txt

Final results:

# Benchmark
# JMH version: 1.35
# VM version: JDK 17.0.3, OpenJDK 64-Bit Server VM, 17.0.3+6-LTS
# VM invoker: C:\Users\windmill\.jdks\corretto-17.0.3\bin\java.exe
# VM options: -Dfile.encoding=windows-1251 -Djava.io.tmpdir=D:\git\own\primitive-vs-wrapper\build\tmp\jmh -Duser.country=GB -Duser.language=en -Duser.variant
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 10 iterations, 10 s each
# Measurement: 10 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time

-----------------------------
deserialize using constructor

afterbirner.deserializeObjectWithALotOfPrimitivesExceededByteCache  thrpt   50   81640.639 ±  773.521   ops/s
afterbirner.deserializeObjectWithALotOfPrimitivesWithinByteCache    thrpt   50   85555.392 ±  436.549   ops/s
afterbirner.deserializeObjectWithALotOfWrappersExceededByteCache    thrpt   50  128431.137 ±  928.981   ops/s
afterbirner.deserializeObjectWithALotOfWrappersWithinByteCache      thrpt   50  132461.997 ± 1138.352   ops/s

blackbird.deserializeObjectWithALotOfPrimitivesExceededByteCache    thrpt   50  126025.889 ± 1097.347   ops/s
blackbird.deserializeObjectWithALotOfPrimitivesWithinByteCache      thrpt   50  134763.383 ±  582.898   ops/s
blackbird.deserializeObjectWithALotOfWrappersExceededByteCache      thrpt   50  123840.477 ± 1018.570   ops/s
blackbird.deserializeObjectWithALotOfWrappersWithinByteCache        thrpt   50  130179.295 ± 1282.986   ops/s

vanilla.deserializeObjectWithALotOfPrimitivesExceededByteCache      thrpt   50   81258.988 ±  827.692   ops/s
vanilla.deserializeObjectWithALotOfPrimitivesWithinByteCache        thrpt   50   84399.022 ±  291.501   ops/s
vanilla.deserializeObjectWithALotOfWrappersExceededByteCache        thrpt   50  129637.185 ± 1647.939   ops/s
vanilla.deserializeObjectWithALotOfWrappersWithinByteCache          thrpt   50  133050.295 ± 1640.740   ops/s

-----------------------------
deserialize using setters

afterbirner.deserializeObjectWithALotOfPrimitivesExceededByteCache  thrpt   50  203490.238 ± 2798.402   ops/s
afterbirner.deserializeObjectWithALotOfPrimitivesWithinByteCache    thrpt   50  226138.299 ± 2402.633   ops/s
afterbirner.deserializeObjectWithALotOfWrappersExceededByteCache    thrpt   50  188753.597 ±  659.647   ops/s
afterbirner.deserializeObjectWithALotOfWrappersWithinByteCache      thrpt   50  206448.245 ± 1423.374   ops/s

blackbird.deserializeObjectWithALotOfPrimitivesExceededByteCache    thrpt   50  156128.816 ± 1295.208   ops/s
blackbird.deserializeObjectWithALotOfPrimitivesWithinByteCache      thrpt   50  168735.433 ± 1699.003   ops/s
blackbird.deserializeObjectWithALotOfWrappersExceededByteCache      thrpt   50  156459.879 ± 1757.727   ops/s
blackbird.deserializeObjectWithALotOfWrappersWithinByteCache        thrpt   50  166642.487 ± 1679.994   ops/s

vanilla.deserializeObjectWithALotOfPrimitivesExceededByteCache      thrpt   50   83447.754 ±  947.815   ops/s
vanilla.deserializeObjectWithALotOfPrimitivesWithinByteCache        thrpt   50   91708.544 ± 1019.130   ops/s
vanilla.deserializeObjectWithALotOfWrappersExceededByteCache        thrpt   50   83004.602 ±  928.473   ops/s
vanilla.deserializeObjectWithALotOfWrappersWithinByteCache          thrpt   50   91662.166 ±  854.692   ops/s


-----------------------------
serialize

afterbirner.serializeObjectWithALotOfPrimitivesExceededByteCache    thrpt   50  325375.230 ± 5065.042   ops/s
afterbirner.serializeObjectWithALotOfPrimitivesWithinByteCache      thrpt   50  372340.380 ± 8338.254   ops/s
afterbirner.serializeObjectWithALotOfWrappersExceededByteCache      thrpt   50  310472.821 ± 2997.814   ops/s
afterbirner.serializeObjectWithALotOfWrappersWithinByteCache        thrpt   50  352965.759 ± 4296.584   ops/s

blackbird.serializeObjectWithALotOfPrimitivesExceededByteCache      thrpt   50  251730.282 ± 4628.612   ops/s
blackbird.serializeObjectWithALotOfPrimitivesWithinByteCache        thrpt   50  296152.628 ± 7248.472   ops/s
blackbird.serializeObjectWithALotOfWrappersExceededByteCache        thrpt   50  237169.427 ± 4177.438   ops/s
blackbird.serializeObjectWithALotOfWrappersWithinByteCache          thrpt   50  270338.766 ± 6136.820   ops/s

vanilla.serializeObjectWithALotOfPrimitivesExceededByteCache        thrpt   50  186049.641 ± 3160.427   ops/s
vanilla.serializeObjectWithALotOfPrimitivesWithinByteCache          thrpt   50  210561.751 ± 3595.253   ops/s
vanilla.serializeObjectWithALotOfWrappersExceededByteCache          thrpt   50  193095.202 ± 4405.213   ops/s
vanilla.serializeObjectWithALotOfWrappersWithinByteCache            thrpt   50  215780.598 ± 5129.804   ops/s

So in general vanilla is loses everyone.

In terms of deserializing using constructor blackbird gives event a slight advantige in terms of using primitives over wrappers. In general deserialization works almost the same for afterburner, blackbird and vanila in terms of deserializtion of Integer wrappers.

But for serialiation there is a huge difference: blackbird works ~35-40% faster than vanilla version for primitives ~22-25% faster than vanilla version for wrappers and in general serializtion of primitives work 5-9% faster than wrappers

afterburner works 25-30% faster than blackbird

But what is really surprising - that's for deserizliation using setters:

  • vanilla setters serializtion loses vanilla constructor serializtion
  • blackbird setters serializtion works ~ 25% faster blackbird constructor serializtion
  • afterburner setters serializtion works almost 2 times faster than afterburner constructor serializtion
  • afterburner setters serializtion works 30% faster blackbird setters serializtion

Just for my case: using blackbird might work and as I understand It's something like replacement for afterburner. But setter afterburner deserializtion work really cool.

Could you please clarify: is there any pitfall of using blackbird or afterburner? I've just got acquainted with them. There is no much info about them in the internet, but the show really cool perfomance. I'd say it seams like they should be shipped along with jackson-databind :D

andriewski avatar Jul 13 '22 17:07 andriewski

Right, so downsides of Afterburner and Blackbird are just that the code paths they replace are not nearly as well tested as those of vanilla (de)serialization. So there may be bugs, but more commonly some features added might not be supported if the underlying (de)serialization changes in jackson-databind. There is also class generation angle: both AB and BB generate lots of new classes and this leads to increased memory usage, and esp. class metadata. Depending on usage they could also be "leaks" over time. Those are probably more theoretical and could occur if retaining ObjectMapper instances accidentally (that is, ones not being used but still referenced).

This is why I'd be hesitant to add either as built-in to databind. Then again, some parts could be safer to add; both implement multiple separate optimizations and maybe some could be merged in databind, others not. For example, speculative optimization during deserialization in which assumption is made that the field order on reading is very likely same as on writing allows use of parser.nextFieldName(expectedName) to sometimes speed of decoding nicely (because parser can just verify that the next byte sequence matched name etc). Or for serializers, sort of "loop unrolling" is quite safe. Neither of these require code generation.

Oh, one thing: I am not sure what "serialization with constructors" here means. Constructors are only used during deserialization?

cowtowncoder avatar Jul 13 '22 18:07 cowtowncoder

@cowtowncoder thank you a lot for you reply! Increasing memory usage, I guess, is not a huge problem, but potential memory leak sounds really upsetting :(

It seams like if someone want to use it in production, first of all - some unit tests should be written just to check that vanillla object mapper and AB or BB serialize/deserialize in the same way Then, some stress sests should be done, just to ensure that, there is no memory leak. But that's a lot of extra work, maybe for some overloaded services that might work incease request per second, but that should be also mesured :)

Basically, most developers use ObjectMapper in spring-boot infrastructure, if I'm not mistaken - the same objectMapper instance is used for whole application, so could you say - is it still possible to have a memory leak for this case?

As you said, maybe just some improvements with databind might also work well. I didn't really expect to have that big difference between primitives and wrapper when constuctor is used for deserizliation, so perhaps at some point you'll find a way to improve it.

About "serialization with constructors" - I can't really find where I mentioned it :) If I did that would mean deserializtion with constuctors. Or if you checked my source code - I just put "Constructor" part in the name of the class just to distinguish it with a class that uses setters for deserializtion - not the best name, but just for testing is fine :)

andriewski avatar Jul 13 '22 21:07 andriewski

Ok just to add some commentary: memory leak part is more speculative -- more due to nature of handling. There have been some reports Blackbird, although none that I recall for Afterburner. So this is NOT something that I hear from users on regular basis.

I think the better way to put it is that it's good to test things out, run things for long enough time, see how they work. Just like you suggest. On plus side enabling Afterburner and/or Blackbird is trivially simple (I hope :) ), so no big code changes needed.

Now, back to possible leak: ideally, yes, if you drop ObjectMapper with registration, that should allow cleaning up of all resources. And I think this is how things work with Afterburner. There been some concerns wrt specific usage of BB (check out an open issue for it under jackson-modules-base, I forget issue number). But typically possible retention gets back to some reference somewhere being left hanging around; maybe some Thread still has a reference to the mapper; or some event handler. Not something that happens for many or most projects, but nasty edge cases. That does not help a lot if your use case happens to be that but... fwtw, this is not a commonly reported problem.

I hope this helps.

And yeah, I figured that "serialize with Constructor" was a misnomer. Just wanted to double-check.

cowtowncoder avatar Jul 13 '22 23:07 cowtowncoder

@cowtowncoder thank you a lot for clarifications! :) I just wanted to ask about some off-topics about thread-safety and is it okay to use single instance for whole application, but I just checked javaDoc in ObjectMapper class - pretty cool doc btw, so yea - it's fully thread safe and one instance is okay :)

So I guess, having one instance shouldn't cause any memory leaks, but I'll check it for sure - at least just to see the difference what's AF or BB gives.

I guess, my question was fully clarified - I'm really appreciate your time, thank you!

But You might do not close the issue, since vanilla deserialization for primitives and wrappers when consuctor is used works not that everyone would expect :)

vanilla.deserializeObjectWithALotOfPrimitivesExceededByteCache      thrpt   50   81258.988 ±  827.692   ops/s
vanilla.deserializeObjectWithALotOfPrimitivesWithinByteCache        thrpt   50   84399.022 ±  291.501   ops/s
vanilla.deserializeObjectWithALotOfWrappersExceededByteCache        thrpt   50  129637.185 ± 1647.939   ops/s
vanilla.deserializeObjectWithALotOfWrappersWithinByteCache          thrpt   50  133050.295 ± 1640.740   ops/s

andriewski avatar Jul 14 '22 17:07 andriewski