structured-merge-diff
structured-merge-diff copied to clipboard
Migrate github.com/json-iterator/go to sigs.k8s.io/json
fixes https://github.com/kubernetes-sigs/structured-merge-diff/issues/202
Replaces "github.com/json-iterator/go" with "sigs.k8s.io/json" and "encoding/json".
TODO: compare benchmarks before and after <- help would be appreciated here.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: inteon Once this PR has been reviewed and has the lgtm label, please assign apelisse for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Yeah, that's ambitious. There's a reason we had written the code the way we did, but let's see what the benchmarks say.
You can do ... IIRC
go test -bench= ./fieldpath/
to run the benchmark. Do that against master and the tip of your branch, save the output in files, run benchstat master.txt pr.txt, see what comes out (you might need to do add a -count=10 to your go test.
Benchmarks will be essential here. This code was hyper-optimized because it was on a hot path.
This is what I get:
> benchstat master.txt pr.txt
goos: darwin
goarch: arm64
pkg: sigs.k8s.io/structured-merge-diff/v4/fieldpath
│ master.txt │ pr.txt │
│ sec/op │ sec/op vs base │
FieldSet/serialize-20-14 3.369µ ± ∞ ¹ 14.304µ ± ∞ ¹ +324.58% (p=0.008 n=5)
FieldSet/deserialize-20-14 10.18µ ± ∞ ¹ 38.24µ ± ∞ ¹ +275.51% (p=0.008 n=5)
FieldSet/serialize-50-14 8.939µ ± ∞ ¹ 38.017µ ± ∞ ¹ +325.29% (p=0.008 n=5)
FieldSet/deserialize-50-14 26.22µ ± ∞ ¹ 105.66µ ± ∞ ¹ +303.03% (p=0.008 n=5)
FieldSet/serialize-100-14 29.20µ ± ∞ ¹ 108.68µ ± ∞ ¹ +272.22% (p=0.008 n=5)
FieldSet/deserialize-100-14 67.92µ ± ∞ ¹ 327.90µ ± ∞ ¹ +382.79% (p=0.008 n=5)
FieldSet/serialize-500-14 143.1µ ± ∞ ¹ 514.4µ ± ∞ ¹ +259.32% (p=0.008 n=5)
FieldSet/deserialize-500-14 310.6µ ± ∞ ¹ 1650.6µ ± ∞ ¹ +431.36% (p=0.008 n=5)
FieldSet/serialize-1000-14 316.4µ ± ∞ ¹ 1081.6µ ± ∞ ¹ +241.84% (p=0.008 n=5)
FieldSet/deserialize-1000-14 664.7µ ± ∞ ¹ 3654.3µ ± ∞ ¹ +449.78% (p=0.008 n=5)
geomean 52.16µ 219.8µ +321.31%
¹ need >= 6 samples for confidence interval at level 0.95
│ master.txt │ pr.txt │
│ B/op │ B/op vs base │
FieldSet/serialize-20-14 1.563Ki ± ∞ ¹ 8.446Ki ± ∞ ¹ +440.22% (p=0.008 n=5)
FieldSet/deserialize-20-14 9.722Ki ± ∞ ¹ 29.034Ki ± ∞ ¹ +198.65% (p=0.008 n=5)
FieldSet/serialize-50-14 4.530Ki ± ∞ ¹ 23.527Ki ± ∞ ¹ +419.34% (p=0.008 n=5)
FieldSet/deserialize-50-14 19.94Ki ± ∞ ¹ 81.76Ki ± ∞ ¹ +310.11% (p=0.008 n=5)
FieldSet/serialize-100-14 15.67Ki ± ∞ ¹ 82.53Ki ± ∞ ¹ +426.74% (p=0.008 n=5)
FieldSet/deserialize-100-14 58.83Ki ± ∞ ¹ 300.17Ki ± ∞ ¹ +410.27% (p=0.008 n=5)
FieldSet/serialize-500-14 72.61Ki ± ∞ ¹ 396.84Ki ± ∞ ¹ +446.52% (p=0.008 n=5)
FieldSet/deserialize-500-14 270.7Ki ± ∞ ¹ 1548.6Ki ± ∞ ¹ +472.07% (p=0.008 n=5)
FieldSet/serialize-1000-14 157.9Ki ± ∞ ¹ 845.1Ki ± ∞ ¹ +435.13% (p=0.008 n=5)
FieldSet/deserialize-1000-14 593.6Ki ± ∞ ¹ 3325.8Ki ± ∞ ¹ +460.25% (p=0.008 n=5)
geomean 34.42Ki 170.0Ki +394.01%
¹ need >= 6 samples for confidence interval at level 0.95
│ master.txt │ pr.txt │
│ allocs/op │ allocs/op vs base │
FieldSet/serialize-20-14 8.000 ± ∞ ¹ 334.000 ± ∞ ¹ +4075.00% (p=0.008 n=5)
FieldSet/deserialize-20-14 232.0 ± ∞ ¹ 582.0 ± ∞ ¹ +150.86% (p=0.008 n=5)
FieldSet/serialize-50-14 14.00 ± ∞ ¹ 926.00 ± ∞ ¹ +6514.29% (p=0.008 n=5)
FieldSet/deserialize-50-14 659.0 ± ∞ ¹ 1649.0 ± ∞ ¹ +150.23% (p=0.008 n=5)
FieldSet/serialize-100-14 40.00 ± ∞ ¹ 3125.00 ± ∞ ¹ +7712.50% (p=0.008 n=5)
FieldSet/deserialize-100-14 2.298k ± ∞ ¹ 5.893k ± ∞ ¹ +156.44% (p=0.008 n=5)
FieldSet/serialize-500-14 185.0 ± ∞ ¹ 15607.0 ± ∞ ¹ +8336.22% (p=0.008 n=5)
FieldSet/deserialize-500-14 11.47k ± ∞ ¹ 29.62k ± ∞ ¹ +158.09% (p=0.008 n=5)
FieldSet/serialize-1000-14 393.0 ± ∞ ¹ 33653.0 ± ∞ ¹ +8463.10% (p=0.008 n=5)
FieldSet/deserialize-1000-14 25.00k ± ∞ ¹ 64.48k ± ∞ ¹ +157.87% (p=0.008 n=5)
geomean 356.2 4.720k +1225.15%
¹ need >= 6 samples for confidence interval at level 0.95
/hold Thanks @apelisse! Holding based on those numbers to make sure we don't cause a performance regression.
I applied a few hacks and optimisations in https://github.com/kubernetes-sigs/structured-merge-diff/pull/257/commits/59b48609980cc1fdede9bb648650281fd55a13ba and https://github.com/kubernetes-sigs/structured-merge-diff/pull/257/commits/4f4fbffba59b2050d7301a3f9661f089f8f0c70a. These changes improved the performance quite a bit compared to the initial version. However, the performance is still a bit worse than the current version on master (bench.old.txt is master and bench.v1.txt is the latest version of this PR):
$ benchstat bench.old.txt bench.v1.txt
goos: linux
goarch: amd64
pkg: sigs.k8s.io/structured-merge-diff/v4/fieldpath
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
│ bench.old.txt │ bench.v1.txt │
│ sec/op │ sec/op vs base │
FieldSet/serialize-20-6 3.526µ ± ∞ ¹ 7.008µ ± ∞ ¹ +98.75% (p=0.029 n=4)
FieldSet/deserialize-20-6 9.038µ ± ∞ ¹ 16.972µ ± ∞ ¹ +87.78% (p=0.029 n=4)
FieldSet/serialize-50-6 11.14µ ± ∞ ¹ 19.83µ ± ∞ ¹ +78.06% (p=0.029 n=4)
FieldSet/deserialize-50-6 25.43µ ± ∞ ¹ 52.32µ ± ∞ ¹ +105.80% (p=0.029 n=4)
FieldSet/serialize-100-6 39.24µ ± ∞ ¹ 68.09µ ± ∞ ¹ +73.52% (p=0.029 n=4)
FieldSet/deserialize-100-6 86.16µ ± ∞ ¹ 181.32µ ± ∞ ¹ +110.44% (p=0.029 n=4)
FieldSet/serialize-500-6 202.0µ ± ∞ ¹ 358.5µ ± ∞ ¹ +77.47% (p=0.029 n=4)
FieldSet/deserialize-500-6 412.8µ ± ∞ ¹ 868.4µ ± ∞ ¹ +110.33% (p=0.029 n=4)
FieldSet/serialize-1000-6 473.1µ ± ∞ ¹ 860.7µ ± ∞ ¹ +81.93% (p=0.029 n=4)
FieldSet/deserialize-1000-6 822.5µ ± ∞ ¹ 1984.8µ ± ∞ ¹ +141.31% (p=0.029 n=4)
geomean 63.03µ 123.3µ +95.57%
¹ need >= 6 samples for confidence interval at level 0.95
│ bench.old.txt │ bench.v1.txt │
│ B/op │ B/op vs base │
FieldSet/serialize-20-6 1.536Ki ± ∞ ¹ 1.621Ki ± ∞ ¹ +5.53% (p=0.029 n=4)
FieldSet/deserialize-20-6 9.745Ki ± ∞ ¹ 15.893Ki ± ∞ ¹ +63.09% (p=0.029 n=4)
FieldSet/serialize-50-6 4.463Ki ± ∞ ¹ 4.282Ki ± ∞ ¹ -4.05% (p=0.029 n=4)
FieldSet/deserialize-50-6 20.02Ki ± ∞ ¹ 46.73Ki ± ∞ ¹ +133.43% (p=0.029 n=4)
FieldSet/serialize-100-6 15.71Ki ± ∞ ¹ 16.22Ki ± ∞ ¹ +3.28% (p=0.029 n=4)
FieldSet/deserialize-100-6 58.98Ki ± ∞ ¹ 164.79Ki ± ∞ ¹ +179.39% (p=0.029 n=4)
FieldSet/serialize-500-6 72.84Ki ± ∞ ¹ 64.28Ki ± ∞ ¹ -11.75% (p=0.029 n=4)
FieldSet/deserialize-500-6 270.1Ki ± ∞ ¹ 842.0Ki ± ∞ ¹ +211.73% (p=0.029 n=4)
FieldSet/serialize-1000-6 154.4Ki ± ∞ ¹ 128.3Ki ± ∞ ¹ -16.91% (p=0.029 n=4)
FieldSet/deserialize-1000-6 593.1Ki ± ∞ ¹ 1805.6Ki ± ∞ ¹ +204.42% (p=0.029 n=4)
geomean 34.27Ki 52.94Ki +54.48%
¹ need >= 6 samples for confidence interval at level 0.95
│ bench.old.txt │ bench.v1.txt │
│ allocs/op │ allocs/op vs base │
FieldSet/serialize-20-6 8.000 ± ∞ ¹ 7.000 ± ∞ ¹ -12.50% (p=0.029 n=4)
FieldSet/deserialize-20-6 231.0 ± ∞ ¹ 253.0 ± ∞ ¹ +9.52% (p=0.029 n=4)
FieldSet/serialize-50-6 14.000 ± ∞ ¹ 9.000 ± ∞ ¹ -35.71% (p=0.029 n=4)
FieldSet/deserialize-50-6 661.0 ± ∞ ¹ 743.5 ± ∞ ¹ +12.48% (p=0.029 n=4)
FieldSet/serialize-100-6 40.00 ± ∞ ¹ 10.00 ± ∞ ¹ -75.00% (p=0.029 n=4)
FieldSet/deserialize-100-6 2.296k ± ∞ ¹ 2.611k ± ∞ ¹ +13.74% (p=0.029 n=4)
FieldSet/serialize-500-6 184.00 ± ∞ ¹ 13.00 ± ∞ ¹ -92.93% (p=0.029 n=4)
FieldSet/deserialize-500-6 11.48k ± ∞ ¹ 13.07k ± ∞ ¹ +13.85% (p=0.029 n=4)
FieldSet/serialize-1000-6 392.50 ± ∞ ¹ 14.00 ± ∞ ¹ -96.43% (p=0.029 n=4)
FieldSet/deserialize-1000-6 24.99k ± ∞ ¹ 28.52k ± ∞ ¹ +14.12% (p=0.029 n=4)
geomean 355.8 170.7 -52.03%
¹ need >= 6 samples for confidence interval at level 0.95
For context, there were quite a few months of engineering effort to optimize the performance of SSA, and in particular, managed fields. I'm supportive of making this code better, but I feel responsible to ensure that we don't regress significantly here.
For context, there were quite a few months of engineering effort to optimize the performance of SSA, and in particular, managed fields. I'm supportive of making this code better, but I feel responsible to ensure that we don't regress significantly here.
This PR aims to be the very best attempt at making https://github.com/kubernetes-sigs/structured-merge-diff/issues/202 work. If we do achieve the required performance, this PR will be the proof that it is not possible to switch to the "encoding/json" package unless the performance of that package is improved.
@inteon: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-structured-merge-diff-vet | 26b574790dfc0f5b10c649b4bbf7fe2dce5cf483 | link | true | /test pull-structured-merge-diff-vet |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
For context, there were quite a few months of engineering effort
For more context, modern-go/reflect2 and json-iterator/go are both unmaintained and broken. #202 documents one problem with reflect2. There are more. For example, it uses some braindead constructs that make it impossible for golang to apply dead code elimination to anything that depends on reflect2, even transitively. Since structured-merge-diff and k8s.io/apimachinery are dependencies of everything that uses k8s, reflect2 hurts everyone by inhibiting the DCE.
reflect2 is the cause of more trouble. For example, see https://github.com/golang/go/commit/ef225d1c57a97af984af114ee52005314530bbe2. I very much regret it that golang authors chose to fix reflect2 when releasing go 1.18, and keep adding kludges.
Suck it up and be correct first.
Suck it up and be correct first.
I don't get it, what are we supposed to suck up? And who is even supposed to?
Are we supposed to suck up the performance impact, or are we supposed to get it done somehow?
@liggitt In 2024, what's the best json library in Go to build a json part by part and stream it directly rather than serialize everything in a big string?
Gojay seems like a good alternative, it seems to have good performance, custom encoding/decoding, and a streaming API: https://github.com/francoispqt/gojay
if we were going to pull in another json library, I'd aim for the same one we used in kube-openapi (https://github.com/kubernetes/kube-openapi/tree/master/pkg/internal/third_party/go-json-experiment) since that's where it looks like the stdlib is headed (https://github.com/golang/go/discussions/63397) and supports streaming / custom encoding/decoding, etc
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale