fastjson
fastjson copied to clipboard
Rewriting the parser using stacks
After figuring out that the benchmark was defective, I created a new parser using the technique mentioned in #24 .
The benchmark results without benchPool, which should be closer to the real performance:
goos: linux
goarch: amd64
pkg: github.com/valyala/fastjson
BenchmarkParse/small/stdjson-map-4 500000 3824 ns/op 49.68 MB/s 960 B/op 51 allocs/op
BenchmarkParse/small/stdjson-struct-4 1000000 2005 ns/op 94.75 MB/s 224 B/op 4 allocs/op
BenchmarkParse/small/stdjson-empty-struct-4 1000000 1420 ns/op 133.75 MB/s 168 B/op 2 allocs/op
BenchmarkParse/small/fastjson-4 1000000 2046 ns/op 92.85 MB/s 3424 B/op 11 allocs/op
BenchmarkParse/small/fastjson-get-4 500000 2178 ns/op 87.21 MB/s 3424 B/op 11 allocs/op
BenchmarkParse/medium/stdjson-map-4 50000 23996 ns/op 97.05 MB/s 10195 B/op 208 allocs/op
BenchmarkParse/medium/stdjson-struct-4 50000 27107 ns/op 85.92 MB/s 9174 B/op 258 allocs/op
BenchmarkParse/medium/stdjson-empty-struct-4 100000 10688 ns/op 217.90 MB/s 280 B/op 5 allocs/op
BenchmarkParse/medium/fastjson-4 200000 9703 ns/op 240.01 MB/s 17688 B/op 54 allocs/op
BenchmarkParse/medium/fastjson-get-4 200000 9814 ns/op 237.30 MB/s 17688 B/op 54 allocs/op
BenchmarkParse/large/stdjson-map-4 5000 335337 ns/op 83.85 MB/s 210764 B/op 2785 allocs/op
BenchmarkParse/large/stdjson-struct-4 10000 140989 ns/op 199.43 MB/s 15617 B/op 353 allocs/op
BenchmarkParse/large/stdjson-empty-struct-4 10000 131005 ns/op 214.63 MB/s 280 B/op 5 allocs/op
BenchmarkParse/large/fastjson-4 10000 122384 ns/op 229.75 MB/s 283200 B/op 540 allocs/op
BenchmarkParse/large/fastjson-get-4 10000 121147 ns/op 232.10 MB/s 283200 B/op 540 allocs/op
BenchmarkParse/canada/stdjson-map-4 30 39510900 ns/op 56.97 MB/s 12260534 B/op 392539 allocs/op
BenchmarkParse/canada/stdjson-struct-4 50 40044040 ns/op 56.21 MB/s 12260139 B/op 392534 allocs/op
BenchmarkParse/canada/stdjson-empty-struct-4 200 9637495 ns/op 233.57 MB/s 291 B/op 5 allocs/op
BenchmarkParse/canada/fastjson-4 20 100080750 ns/op 22.49 MB/s 75844145 B/op 114252 allocs/op
BenchmarkParse/canada/fastjson-get-4 30 54762833 ns/op 41.11 MB/s 75844142 B/op 114252 allocs/op
BenchmarkParse/citm/stdjson-map-4 100 15829930 ns/op 109.11 MB/s 5214145 B/op 95402 allocs/op
BenchmarkParse/citm/stdjson-struct-4 200 7847090 ns/op 220.11 MB/s 1993 B/op 75 allocs/op
BenchmarkParse/citm/stdjson-empty-struct-4 200 7860580 ns/op 219.73 MB/s 281 B/op 5 allocs/op
BenchmarkParse/citm/fastjson-4 100 10795730 ns/op 159.99 MB/s 17601362 B/op 30574 allocs/op
BenchmarkParse/citm/fastjson-get-4 200 10633195 ns/op 162.44 MB/s 17601360 B/op 30574 allocs/op
BenchmarkParse/twitter/stdjson-map-4 200 5939440 ns/op 106.33 MB/s 2187556 B/op 31264 allocs/op
BenchmarkParse/twitter/stdjson-struct-4 500 2821878 ns/op 223.79 MB/s 409 B/op 6 allocs/op
BenchmarkParse/twitter/stdjson-empty-struct-4 500 2807618 ns/op 224.93 MB/s 408 B/op 6 allocs/op
BenchmarkParse/twitter/fastjson-4 500 2810480 ns/op 224.70 MB/s 5047840 B/op 4729 allocs/op
BenchmarkParse/twitter/fastjson-get-4 500 2816916 ns/op 224.19 MB/s 5047840 B/op 4729 allocs/op
PASS
ok github.com/valyala/fastjson 60.703s
fastjson was even slower then stdjson despite claimed 15x improvement.
After rewriting the parser using stacks, the results are:
goos: linux
goarch: amd64
pkg: github.com/valyala/fastjson
BenchmarkParse/small/stdjson-map-4 300000 3967 ns/op 47.89 MB/s 960 B/op 51 allocs/op
BenchmarkParse/small/stdjson-struct-4 1000000 1979 ns/op 95.99 MB/s 224 B/op 4 allocs/op
BenchmarkParse/small/stdjson-empty-struct-4 1000000 1422 ns/op 133.57 MB/s 168 B/op 2 allocs/op
BenchmarkParse/small/fastjson-4 1000000 1458 ns/op 130.28 MB/s 2576 B/op 11 allocs/op
BenchmarkParse/small/fastjson-get-4 1000000 1585 ns/op 119.86 MB/s 2576 B/op 11 allocs/op
BenchmarkParse/medium/stdjson-map-4 100000 22231 ns/op 104.76 MB/s 10195 B/op 208 allocs/op
BenchmarkParse/medium/stdjson-struct-4 50000 25822 ns/op 90.19 MB/s 9174 B/op 258 allocs/op
BenchmarkParse/medium/stdjson-empty-struct-4 200000 10289 ns/op 226.35 MB/s 280 B/op 5 allocs/op
BenchmarkParse/medium/fastjson-4 200000 8781 ns/op 265.22 MB/s 17528 B/op 19 allocs/op
BenchmarkParse/medium/fastjson-get-4 200000 8879 ns/op 262.29 MB/s 17528 B/op 19 allocs/op
BenchmarkParse/large/stdjson-map-4 3000 337614 ns/op 83.28 MB/s 210761 B/op 2785 allocs/op
BenchmarkParse/large/stdjson-struct-4 10000 146534 ns/op 191.89 MB/s 15617 B/op 353 allocs/op
BenchmarkParse/large/stdjson-empty-struct-4 10000 129590 ns/op 216.98 MB/s 280 B/op 5 allocs/op
BenchmarkParse/large/fastjson-4 20000 65955 ns/op 426.32 MB/s 165288 B/op 37 allocs/op
BenchmarkParse/large/fastjson-get-4 20000 65889 ns/op 426.74 MB/s 165288 B/op 37 allocs/op
BenchmarkParse/canada/stdjson-map-4 30 39993266 ns/op 56.29 MB/s 12260535 B/op 392539 allocs/op
BenchmarkParse/canada/stdjson-struct-4 50 40921580 ns/op 55.01 MB/s 12260137 B/op 392534 allocs/op
BenchmarkParse/canada/stdjson-empty-struct-4 200 9418370 ns/op 239.01 MB/s 280 B/op 5 allocs/op
BenchmarkParse/canada/fastjson-4 100 15223640 ns/op 147.87 MB/s 25831338 B/op 60 allocs/op
BenchmarkParse/canada/fastjson-get-4 100 14208220 ns/op 158.43 MB/s 25831338 B/op 60 allocs/op
BenchmarkParse/citm/stdjson-map-4 100 16219920 ns/op 106.49 MB/s 5213919 B/op 95401 allocs/op
BenchmarkParse/citm/stdjson-struct-4 200 7732285 ns/op 223.38 MB/s 1993 B/op 75 allocs/op
BenchmarkParse/citm/stdjson-empty-struct-4 200 7791690 ns/op 221.67 MB/s 281 B/op 5 allocs/op
BenchmarkParse/citm/fastjson-4 300 4030463 ns/op 428.54 MB/s 7909291 B/op 59 allocs/op
BenchmarkParse/citm/fastjson-get-4 300 3764146 ns/op 458.86 MB/s 7909289 B/op 59 allocs/op
BenchmarkParse/twitter/stdjson-map-4 200 6131945 ns/op 102.99 MB/s 2188071 B/op 31266 allocs/op
BenchmarkParse/twitter/stdjson-struct-4 500 2844898 ns/op 221.98 MB/s 409 B/op 6 allocs/op
BenchmarkParse/twitter/stdjson-empty-struct-4 500 2823320 ns/op 223.68 MB/s 408 B/op 6 allocs/op
BenchmarkParse/twitter/fastjson-4 1000 1084364 ns/op 582.38 MB/s 2359209 B/op 49 allocs/op
BenchmarkParse/twitter/fastjson-get-4 1000 1068186 ns/op 591.20 MB/s 2359208 B/op 49 allocs/op
PASS
ok github.com/valyala/fastjson 57.055s
fastjson is still slower than original, but now it is really faster than stdjson, rather than.fake results obtained by parsing the exact same json value again and again.
There is no need to reuse Parser now since it is always reset before Parse. There may still room for improvement by reusing Parser, worth investigating later.
Codecov Report
Merging #25 into master will decrease coverage by
0.23%. The diff coverage is94.87%.
@@ Coverage Diff @@
## master #25 +/- ##
==========================================
- Coverage 93.11% 92.88% -0.24%
==========================================
Files 9 9
Lines 1046 1124 +78
==========================================
+ Hits 974 1044 +70
- Misses 49 55 +6
- Partials 23 25 +2
| Impacted Files | Coverage Δ | |
|---|---|---|
| scanner.go | 100% <100%> (ø) |
:arrow_up: |
| arena.go | 100% <100%> (ø) |
:arrow_up: |
| update.go | 78.87% <82.35%> (+0.69%) |
:arrow_up: |
| parser.go | 90.36% <96.9%> (+0.19%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update cbbc967...651c591. Read the comment docs.
After resuing the parser, here are the new benchmark results:
goos: linux
goarch: amd64
pkg: github.com/valyala/fastjson
BenchmarkParse/small/stdjson-map-4 300000 4660 ns/op 40.77 MB/s 960 B/op 51 allocs/op
BenchmarkParse/small/stdjson-struct-4 1000000 2362 ns/op 80.43 MB/s 224 B/op 4 allocs/op
BenchmarkParse/small/stdjson-empty-struct-4 1000000 1442 ns/op 131.69 MB/s 168 B/op 2 allocs/op
BenchmarkParse/small/fastjson-4 5000000 356 ns/op 532.46 MB/s 192 B/op 1 allocs/op
BenchmarkParse/small/fastjson-get-4 3000000 496 ns/op 382.78 MB/s 192 B/op 1 allocs/op
BenchmarkParse/medium/stdjson-map-4 50000 25757 ns/op 90.42 MB/s 10195 B/op 208 allocs/op
BenchmarkParse/medium/stdjson-struct-4 50000 30011 ns/op 77.60 MB/s 9174 B/op 258 allocs/op
BenchmarkParse/medium/stdjson-empty-struct-4 100000 11133 ns/op 209.18 MB/s 280 B/op 5 allocs/op
BenchmarkParse/medium/fastjson-4 500000 3786 ns/op 615.14 MB/s 2688 B/op 1 allocs/op
BenchmarkParse/medium/fastjson-get-4 300000 3373 ns/op 690.34 MB/s 2688 B/op 1 allocs/op
BenchmarkParse/large/stdjson-map-4 3000 375616 ns/op 74.86 MB/s 210749 B/op 2785 allocs/op
BenchmarkParse/large/stdjson-struct-4 10000 160624 ns/op 175.05 MB/s 15617 B/op 353 allocs/op
BenchmarkParse/large/stdjson-empty-struct-4 10000 136074 ns/op 206.64 MB/s 280 B/op 5 allocs/op
BenchmarkParse/large/fastjson-4 30000 38770 ns/op 725.24 MB/s 28707 B/op 1 allocs/op
BenchmarkParse/large/fastjson-get-4 30000 50005 ns/op 562.30 MB/s 28707 B/op 1 allocs/op
BenchmarkParse/canada/stdjson-map-4 20 56081600 ns/op 40.14 MB/s 12260568 B/op 392540 allocs/op
BenchmarkParse/canada/stdjson-struct-4 30 42409066 ns/op 53.08 MB/s 12260171 B/op 392534 allocs/op
BenchmarkParse/canada/stdjson-empty-struct-4 200 10085860 ns/op 223.19 MB/s 281 B/op 5 allocs/op
BenchmarkParse/canada/fastjson-4 200 7093665 ns/op 317.33 MB/s 3185747 B/op 2 allocs/op
BenchmarkParse/canada/fastjson-get-4 200 6163670 ns/op 365.21 MB/s 3185753 B/op 3 allocs/op
BenchmarkParse/citm/stdjson-map-4 100 18414800 ns/op 93.79 MB/s 5214044 B/op 95402 allocs/op
BenchmarkParse/citm/stdjson-struct-4 200 9872230 ns/op 174.96 MB/s 1994 B/op 75 allocs/op
BenchmarkParse/citm/stdjson-empty-struct-4 200 8464495 ns/op 204.05 MB/s 284 B/op 5 allocs/op
BenchmarkParse/citm/fastjson-4 500 2291844 ns/op 753.63 MB/s 1827241 B/op 1 allocs/op
BenchmarkParse/citm/fastjson-get-4 1000 2131119 ns/op 810.47 MB/s 1777878 B/op 1 allocs/op
BenchmarkParse/twitter/stdjson-map-4 200 6208910 ns/op 101.71 MB/s 2188398 B/op 31267 allocs/op
BenchmarkParse/twitter/stdjson-struct-4 500 3027482 ns/op 208.59 MB/s 409 B/op 6 allocs/op
BenchmarkParse/twitter/stdjson-empty-struct-4 500 2823266 ns/op 223.68 MB/s 408 B/op 6 allocs/op
BenchmarkParse/twitter/fastjson-4 2000 775716 ns/op 814.10 MB/s 645840 B/op 1 allocs/op
BenchmarkParse/twitter/fastjson-get-4 2000 787570 ns/op 801.85 MB/s 645840 B/op 1 allocs/op
PASS
ok github.com/valyala/fastjson 62.245s
Now it's much closer to the original performance. Here is the CPU profile:
File: fastjson.test
Type: cpu
Time: Jan 22, 2019 at 11:11pm (CST)
Duration: 21.35s, Total samples = 52.83s (247.49%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 36.16s, 68.45% of 52.83s total
Dropped 141 nodes (cum <= 0.26s)
Showing top 10 nodes out of 60
flat flat% sum% cum cum%
6.45s 12.21% 12.21% 42.34s 80.14% github.com/valyala/fastjson.(*Parser).parseObject
5.42s 10.26% 22.47% 8.69s 16.45% github.com/valyala/fastjson.skipWS
5.01s 9.48% 31.95% 42.41s 80.28% github.com/valyala/fastjson.(*Parser).parseValue
3.59s 6.80% 38.75% 3.59s 6.80% github.com/valyala/fastjson.parseRawKey
3.36s 6.36% 45.11% 3.36s 6.36% runtime.memmove
3.27s 6.19% 51.30% 3.27s 6.19% github.com/valyala/fastjson.skipWSSlow
2.83s 5.36% 56.65% 3.54s 6.70% runtime.findObject
2.43s 4.60% 61.25% 2.88s 5.45% github.com/valyala/fastjson.(*Parser).getValue
2.01s 3.80% 65.06% 27.18s 51.45% github.com/valyala/fastjson.(*Parser).parseArray
1.79s 3.39% 68.45% 5.55s 10.51% runtime.wbBufFlush1
@valyala can this be merged as well?
This is great idea, since it requires less RAM when parsing JSON structs with non-constant structure, but it slows down parsing a bit. I tried playing with this PR in the mem-optimize2 branch, but it isn't ready for merging into master.
Any update for this PR? I like the concept of this library, but I'm having memory leaks using parserPool, using it without pool is slow.