jackson-module-kotlin icon indicating copy to clipboard operation
jackson-module-kotlin copied to clipboard

Significant improvement in deserialization speed

Open k163377 opened this issue 3 years ago • 26 comments

Incorporating several speedup techniques, the deserialization speed has been significantly improved. The use cases that will be improved are as follows

  • Calling a constructor with default arguments
  • Calling factory methods (with or without default arguments)

A simple benchmark showed that in the faster pattern, the entire deserialization was running at twice the current speed.

  • constructor.Benchmarks.useDefaultKotlinMapper: Constructor calls with default arguments
  • factory.Benchmarks.useDefaultKotlinMapper: Factory method calls using default arguments
  • factory.Benchmarks.useKotlinMapper:Factory method calls without default arguments

before:

Benchmark                                           Mode  Cnt        Score       Error  Units
c.w.constructor.Benchmarks.useDefaultKotlinMapper  thrpt    9   756210.466 ±  6515.810  ops/s
c.w.constructor.Benchmarks.useKotlinMapper         thrpt    9  1234679.204 ± 13448.457  ops/s
c.w.factory.Benchmarks.useDefaultKotlinMapper      thrpt    9   620649.893 ± 23693.451  ops/s
c.w.factory.Benchmarks.useJavaMapper               thrpt    9  3111177.275 ± 47434.465  ops/s
c.w.factory.Benchmarks.useKotlinMapper             thrpt    9   651496.500 ±  8709.582  ops/s

after:

Benchmark                                           Mode  Cnt        Score       Error  Units
c.w.constructor.Benchmarks.useDefaultKotlinMapper  thrpt    9  1419415.078 ± 15420.143  ops/s
c.w.constructor.Benchmarks.useKotlinMapper         thrpt    9  1297360.402 ±  4506.412  ops/s
c.w.factory.Benchmarks.useDefaultKotlinMapper      thrpt    9  1394035.380 ±  2169.456  ops/s
c.w.factory.Benchmarks.useJavaMapper               thrpt    9  3109088.759 ± 94393.794  ops/s
c.w.factory.Benchmarks.useKotlinMapper             thrpt    9  1323363.570 ± 16404.971  ops/s

In the benchmark after the change, the score with the default arguments is higher, but this is probably due to the difference in the number of arguments to be read. In principle, if the number of arguments to be read is the same, the benchmark results will be almost the same.

About the speed-up techniques

Direct invocation of Java reflection

This approach provides the greatest speedup.

Currently, deserialization is done by calling KFunction, but this is very slow. Therefore, I have made a speedup by calling Java reflection directly.

I used moshi-codegen as a reference for this idea.

The following article may be helpful for the principle of operation.

Caching of accessibility/instance

Currently, the reflection accessibility is evaluated at each runtime, which hinders the caching of instance. Therefore, the accessibility is cached the first time it is looked at, and it is used thereafter. This change in structure makes it possible to cache the instance.

This will resolve the following comments. https://github.com/FasterXML/jackson-module-kotlin/blob/91d8b351b61411c325d2b8b99f8305219f332ff1/src/main/kotlin/com/fasterxml/jackson/module/kotlin/KotlinValueInstantiator.kt#L69

In addition, it is now easier to determine the fallback.

Avoid spread operator

This change intentionally avoids using spread operator when it can use. Because the spread operator in Kotlin has a large execution cost.

Specifically, I use means such as copying to an array manually and wrapping variable length argument calls with Java.

Future benefits

As a side note, I would like to write about the future benefits of this improvement.

With this approach, the deserialization is done only by calling Java reflection, which makes it easier to use LambdaMetafactory for example for further speedup in the future. It will also be easier to remove the kotlin-reflect module by incorporating alternatives such as kotlinx-metadata.

What I would like to discuss

As you can see from the code, I need to make some very big changes to incorporate this idea. On the other hand, I don't have enough understanding of the design principles and coding rules of jackson-module-kotlin, so I would appreciate your advice.

Also, we worked from the 2.12 branch because of problems running comparison benchmarks, but we are always ready to rebuild it as a PR against master if needed.

I'm very sorry that I'm suddenly throwing out a huge PR.

Thank you.

k163377 avatar May 03 '21 09:05 k163377

One quick comment: change would definitely need to be against 2.13 branch, to minimize risk of breakage (which I think there is just due to scope).

cowtowncoder avatar May 03 '21 15:05 cowtowncoder

I cannot comment a lot on changes, but I like the idea -- one thing that I was always trying to suggest was that as much work as possible SHOULD be done when constructing ValueInstantiator, once, and as little as possible when instantiator is used. This because absolutely nothing in class definitions changes across latter calls so things that need not be done every time are pure overhead. This is one of bigger reasons why Kotlin module has so much more overhead than Java none: Java version does very little during actual deserialization by pre-processing information ahead of time. It looks like this same idea is used here which is good!

One thing that would be nice, but I am not sure if it is possible, would be to have choice of two backends for 2.13: "old" one for backwards compatibility, and "new" one with optimizations. Users would have to opt-in (enable) use of the new, faster version, and only use it if it works for them. If and when issues are resolved, in 2.14 and later new backend would become the default. This may not be practical but I mention it just in case it might be.

cowtowncoder avatar May 03 '21 15:05 cowtowncoder

@cowtowncoder Thanks for the reply.

One thing that would be nice, but I am not sure if it is possible, would be to have choice of two backends for 2.13: "old" one for backwards compatibility, and "new" one with optimizations. Users would have to opt-in (enable) use of the new, faster version, and only use it if it works for them. If and when issues are resolved, in 2.14 and later new backend would become the default.

That is certainly true. I too think that this change is so large that it would be preferable to be able to switch between the two.

I tried to add a property to switch between the two. How about it? 2bc2743 75a1c3e ace6cd0

One quick comment: change would definitely need to be against 2.13 branch, to minimize risk of breakage (which I think there is just due to scope).

Understood. However, since I've already submitted the PR, I'm thinking of having it reviewed as is and changing the branch later, what do you think?

Also, which is a better way to do this change process, creating a new PR or doing a rebase?

k163377 avatar May 03 '21 23:05 k163377

After creating this PR, I noticed that the reflection process on read is also cacheable. I redid a simple benchmark (with more benchmarks, but little change in the score) using a branch that incorporates these results, and the results are as follows.

Benchmark                                           Mode  Cnt        Score       Error  Units
c.w.constructor.Benchmarks.useDefaultKotlinMapper  thrpt    9  2513352.950 ± 47859.737  ops/s
c.w.constructor.Benchmarks.useJavaMapper           thrpt    9  3054154.327 ± 27503.185  ops/s
c.w.constructor.Benchmarks.useKotlinMapper         thrpt    9  2353756.705 ±  9910.363  ops/s
c.w.factory.Benchmarks.useDefaultKotlinMapper      thrpt    9  2438265.573 ± 57048.905  ops/s
c.w.factory.Benchmarks.useJavaMapper               thrpt    9  3070987.061 ± 32149.819  ops/s
c.w.factory.Benchmarks.useKotlinMapper             thrpt    9  2326640.414 ± 62539.462  ops/s

The score is increased to about 4/5 compared to Java deserialization. The results also show that the performance of StrictNullCheck is expected to improve significantly as well.

Initially, I intended to make the readout process common between the experimental backend and the conventional backend, but I reverted the commits for commonality in case I want to incorporate this content in the future. 455eae373475ca03989c40b0aef0befff8b49493

Considering the huge size of the PR, I'm thinking of creating a pull request later for caching the reflection process when reading, but should I put it together?

k163377 avatar May 05 '21 06:05 k163377

Looking through this in detail now, as you mention it's big so it'll probably take me a few days. All looks good so far, thank you for the clear explanations.

To your last question, yes, a separate PR for the read side makes sense to me.

dinomite avatar May 07 '21 10:05 dinomite

@k163377 Would you mind adding some class comments to explain the major intents of the ArgumentBucket? It's a bit dense and I'm having trouble parsing what it's intent is.

dinomite avatar May 07 '21 20:05 dinomite

ArgumentBucket was where I was starting, just because it's the (second) smallest class. Tips on where to start looking to understand are appreciated! (context note: I don't actually have a thorough understanding of the internals of this module or Jackson's inner workings, I've been pretty slow getting up to speed on things)

dinomite avatar May 07 '21 20:05 dinomite

@dinomite I've added Javadoc and comments for ArgumentBucket. For other classes, I'll add some comments if the explanation seems insufficient.

k163377 avatar May 08 '21 02:05 k163377

@k163377 Great, thanks for clarifying that bitmasking—I'm a bit dense at understanding such things. Things look good to me overall, I'll give another look next week.

dinomite avatar May 08 '21 20:05 dinomite

Ok good improvements! And pretty impressive performance gains

@k163377 couple of general notes/answers:

  • Yes, it's fine to keep this against 2.12 before we merge, just change it before merge
  • +1 for making these changes first, make sure this merges cleanly from 2.13 to master (usually requires some small manual conflict resolutions due to renaming / adding of context passing in master), and only then add more caching.
  • Since it is possible to test old and new backends, it would be good to have either new tests for new backend, or improve some existing tests to exercise both.

cowtowncoder avatar May 09 '21 01:05 cowtowncoder

One comment/thought inspired by changes here, @dinomite and @k163377 -- now that there are quite a few boolean-valued configuration settings, I wonder if these should grouped as something like KotlinMapper.Feature? So that we could pass a set of configuration settings as one int internally and (optionally) expose this to users as well. Alternatively could only use internally at first, keep separate config settings.

Such change should be separate from this PR of course, but I'll introduce idea now. Created #442 for that idea.

cowtowncoder avatar May 09 '21 01:05 cowtowncoder

@dinomite This change will interfere with the caching of reflection process on read later, so I'd like you to revert it. 074d87d7d52a11f24acb6cc0f1b60af98a319872

https://github.com/FasterXML/jackson-module-kotlin/pull/439#issuecomment-832444648

k163377 avatar May 09 '21 08:05 k163377

Ahh, unfortunate. I was hoping to make those big methods a bit simpler.

dinomite avatar May 10 '21 18:05 dinomite

I have a branch that has been modified for 2.13 to make it an official PR. It is ready for force push or creation of a new PR.

  • Branch: https://github.com/k163377/jackson-module-kotlin/tree/speed_up/pr_2.13
  • Diff: https://github.com/FasterXML/jackson-module-kotlin/compare/2.13...k163377:speed_up/pr_2.13

k163377 avatar May 16 '21 06:05 k163377

The branch for 2.13 has been re-pushed to incorporate the changes in #454. https://github.com/FasterXML/jackson-module-kotlin/pull/439#issuecomment-841773663

k163377 avatar Jun 06 '21 06:06 k163377

@k163377 I think you can go ahead and force-push to this branch and get rid of the separate 2.13 branch. We're not expecting to do any further 2.12 releases (2.13 is on the horizon)

dinomite avatar Jun 07 '21 10:06 dinomite

@dinomite Completed force push.

k163377 avatar Jun 07 '21 17:06 k163377

when this pr can be merged? i'm still waiting...

qingmo avatar Oct 14 '21 05:10 qingmo

I'm sorry I haven't responded for a long time.

I don't think it's possible to merge this PR in its current state. I think it is impossible to merge this PR as it is, for two reasons: there is some processing that may not be enough considering the value class support, and the scale of the PR is too large to review.

Therefore, I am personally thinking of dividing this PR into the following three steps.

  1. change the current reflection process to cache as much as possible.
  2. Change to call KFunction with call, if possible.
  3. Change to use Java Reflection.

However, since the reviewers seem to be very busy, I am going to wait for the current PR to be reviewed and merged first, so as not to burden them further.

k163377 avatar Oct 14 '21 06:10 k163377

Ok, I'm back at things. I've looked through the comments on here and resolved most of them, I also brought this up to date with the 2.13 branch.

I added some comments of my own, mostly looking for tests on the new code. I'm hoping those aren't too burdensome (I know things like KotlinValueInstatiator can be so complex just to setup it might not be possible), but test would do well for explaining how the code works.

dinomite avatar Oct 15 '21 10:10 dinomite

Also, I'll be back on this. Once some tests are in place I think this is ready to merge

dinomite avatar Oct 15 '21 10:10 dinomite

@dinomite I am currently working on solving issue #413, and this PR is expected to be heavily affected by it.

Therefore, I would like to incorporate all of the contents of this PR after I finish resolving #413. On the other hand, the change in 1 mentioned in the previous comment can be dealt with independently and is expected to be effective, so I think it is better to deal with it first.

Is it okay if I start from problem 1?

k163377 avatar Oct 26 '21 20:10 k163377

I'll be merging #512 shortly—what's next for this PR?

dinomite avatar Jan 03 '22 15:01 dinomite

@dinomite I'm going to do the following three things.

  • Improve efficiency of argument management and speed up the case of calling factory functions without default arguments.
  • Refactoring of the KVI argument reading process.
  • Fix minor omissions in #512.

For various reasons, the PR on this will probably be issued sometime this weekend or next Monday.

k163377 avatar Jan 03 '22 16:01 k163377

This PR still active?

albertocavalcante avatar Dec 31 '22 06:12 albertocavalcante

I have released a project that incorporates these improvements. If you are interested, please give it a try. https://github.com/ProjectMapK/jackson-module-kogera

k163377 avatar Jan 14 '23 15:01 k163377