fastjson
fastjson copied to clipboard
Rewriting the parser using stacks
After figuring out that the benchmark was defective, I created a new parser using the technique mentioned in #24 .
The benchmark results without benchPool
, which should be closer to the real performance:
goos: linux
goarch: amd64
pkg: github.com/valyala/fastjson
BenchmarkParse/small/stdjson-map-4 500000 3824 ns/op 49.68 MB/s 960 B/op 51 allocs/op
BenchmarkParse/small/stdjson-struct-4 1000000 2005 ns/op 94.75 MB/s 224 B/op 4 allocs/op
BenchmarkParse/small/stdjson-empty-struct-4 1000000 1420 ns/op 133.75 MB/s 168 B/op 2 allocs/op
BenchmarkParse/small/fastjson-4 1000000 2046 ns/op 92.85 MB/s 3424 B/op 11 allocs/op
BenchmarkParse/small/fastjson-get-4 500000 2178 ns/op 87.21 MB/s 3424 B/op 11 allocs/op
BenchmarkParse/medium/stdjson-map-4 50000 23996 ns/op 97.05 MB/s 10195 B/op 208 allocs/op
BenchmarkParse/medium/stdjson-struct-4 50000 27107 ns/op 85.92 MB/s 9174 B/op 258 allocs/op
BenchmarkParse/medium/stdjson-empty-struct-4 100000 10688 ns/op 217.90 MB/s 280 B/op 5 allocs/op
BenchmarkParse/medium/fastjson-4 200000 9703 ns/op 240.01 MB/s 17688 B/op 54 allocs/op
BenchmarkParse/medium/fastjson-get-4 200000 9814 ns/op 237.30 MB/s 17688 B/op 54 allocs/op
BenchmarkParse/large/stdjson-map-4 5000 335337 ns/op 83.85 MB/s 210764 B/op 2785 allocs/op
BenchmarkParse/large/stdjson-struct-4 10000 140989 ns/op 199.43 MB/s 15617 B/op 353 allocs/op
BenchmarkParse/large/stdjson-empty-struct-4 10000 131005 ns/op 214.63 MB/s 280 B/op 5 allocs/op
BenchmarkParse/large/fastjson-4 10000 122384 ns/op 229.75 MB/s 283200 B/op 540 allocs/op
BenchmarkParse/large/fastjson-get-4 10000 121147 ns/op 232.10 MB/s 283200 B/op 540 allocs/op
BenchmarkParse/canada/stdjson-map-4 30 39510900 ns/op 56.97 MB/s 12260534 B/op 392539 allocs/op
BenchmarkParse/canada/stdjson-struct-4 50 40044040 ns/op 56.21 MB/s 12260139 B/op 392534 allocs/op
BenchmarkParse/canada/stdjson-empty-struct-4 200 9637495 ns/op 233.57 MB/s 291 B/op 5 allocs/op
BenchmarkParse/canada/fastjson-4 20 100080750 ns/op 22.49 MB/s 75844145 B/op 114252 allocs/op
BenchmarkParse/canada/fastjson-get-4 30 54762833 ns/op 41.11 MB/s 75844142 B/op 114252 allocs/op
BenchmarkParse/citm/stdjson-map-4 100 15829930 ns/op 109.11 MB/s 5214145 B/op 95402 allocs/op
BenchmarkParse/citm/stdjson-struct-4 200 7847090 ns/op 220.11 MB/s 1993 B/op 75 allocs/op
BenchmarkParse/citm/stdjson-empty-struct-4 200 7860580 ns/op 219.73 MB/s 281 B/op 5 allocs/op
BenchmarkParse/citm/fastjson-4 100 10795730 ns/op 159.99 MB/s 17601362 B/op 30574 allocs/op
BenchmarkParse/citm/fastjson-get-4 200 10633195 ns/op 162.44 MB/s 17601360 B/op 30574 allocs/op
BenchmarkParse/twitter/stdjson-map-4 200 5939440 ns/op 106.33 MB/s 2187556 B/op 31264 allocs/op
BenchmarkParse/twitter/stdjson-struct-4 500 2821878 ns/op 223.79 MB/s 409 B/op 6 allocs/op
BenchmarkParse/twitter/stdjson-empty-struct-4 500 2807618 ns/op 224.93 MB/s 408 B/op 6 allocs/op
BenchmarkParse/twitter/fastjson-4 500 2810480 ns/op 224.70 MB/s 5047840 B/op 4729 allocs/op
BenchmarkParse/twitter/fastjson-get-4 500 2816916 ns/op 224.19 MB/s 5047840 B/op 4729 allocs/op
PASS
ok github.com/valyala/fastjson 60.703s
fastjson
was even slower then stdjson
despite claimed 15x improvement.
After rewriting the parser using stacks, the results are:
goos: linux
goarch: amd64
pkg: github.com/valyala/fastjson
BenchmarkParse/small/stdjson-map-4 300000 3967 ns/op 47.89 MB/s 960 B/op 51 allocs/op
BenchmarkParse/small/stdjson-struct-4 1000000 1979 ns/op 95.99 MB/s 224 B/op 4 allocs/op
BenchmarkParse/small/stdjson-empty-struct-4 1000000 1422 ns/op 133.57 MB/s 168 B/op 2 allocs/op
BenchmarkParse/small/fastjson-4 1000000 1458 ns/op 130.28 MB/s 2576 B/op 11 allocs/op
BenchmarkParse/small/fastjson-get-4 1000000 1585 ns/op 119.86 MB/s 2576 B/op 11 allocs/op
BenchmarkParse/medium/stdjson-map-4 100000 22231 ns/op 104.76 MB/s 10195 B/op 208 allocs/op
BenchmarkParse/medium/stdjson-struct-4 50000 25822 ns/op 90.19 MB/s 9174 B/op 258 allocs/op
BenchmarkParse/medium/stdjson-empty-struct-4 200000 10289 ns/op 226.35 MB/s 280 B/op 5 allocs/op
BenchmarkParse/medium/fastjson-4 200000 8781 ns/op 265.22 MB/s 17528 B/op 19 allocs/op
BenchmarkParse/medium/fastjson-get-4 200000 8879 ns/op 262.29 MB/s 17528 B/op 19 allocs/op
BenchmarkParse/large/stdjson-map-4 3000 337614 ns/op 83.28 MB/s 210761 B/op 2785 allocs/op
BenchmarkParse/large/stdjson-struct-4 10000 146534 ns/op 191.89 MB/s 15617 B/op 353 allocs/op
BenchmarkParse/large/stdjson-empty-struct-4 10000 129590 ns/op 216.98 MB/s 280 B/op 5 allocs/op
BenchmarkParse/large/fastjson-4 20000 65955 ns/op 426.32 MB/s 165288 B/op 37 allocs/op
BenchmarkParse/large/fastjson-get-4 20000 65889 ns/op 426.74 MB/s 165288 B/op 37 allocs/op
BenchmarkParse/canada/stdjson-map-4 30 39993266 ns/op 56.29 MB/s 12260535 B/op 392539 allocs/op
BenchmarkParse/canada/stdjson-struct-4 50 40921580 ns/op 55.01 MB/s 12260137 B/op 392534 allocs/op
BenchmarkParse/canada/stdjson-empty-struct-4 200 9418370 ns/op 239.01 MB/s 280 B/op 5 allocs/op
BenchmarkParse/canada/fastjson-4 100 15223640 ns/op 147.87 MB/s 25831338 B/op 60 allocs/op
BenchmarkParse/canada/fastjson-get-4 100 14208220 ns/op 158.43 MB/s 25831338 B/op 60 allocs/op
BenchmarkParse/citm/stdjson-map-4 100 16219920 ns/op 106.49 MB/s 5213919 B/op 95401 allocs/op
BenchmarkParse/citm/stdjson-struct-4 200 7732285 ns/op 223.38 MB/s 1993 B/op 75 allocs/op
BenchmarkParse/citm/stdjson-empty-struct-4 200 7791690 ns/op 221.67 MB/s 281 B/op 5 allocs/op
BenchmarkParse/citm/fastjson-4 300 4030463 ns/op 428.54 MB/s 7909291 B/op 59 allocs/op
BenchmarkParse/citm/fastjson-get-4 300 3764146 ns/op 458.86 MB/s 7909289 B/op 59 allocs/op
BenchmarkParse/twitter/stdjson-map-4 200 6131945 ns/op 102.99 MB/s 2188071 B/op 31266 allocs/op
BenchmarkParse/twitter/stdjson-struct-4 500 2844898 ns/op 221.98 MB/s 409 B/op 6 allocs/op
BenchmarkParse/twitter/stdjson-empty-struct-4 500 2823320 ns/op 223.68 MB/s 408 B/op 6 allocs/op
BenchmarkParse/twitter/fastjson-4 1000 1084364 ns/op 582.38 MB/s 2359209 B/op 49 allocs/op
BenchmarkParse/twitter/fastjson-get-4 1000 1068186 ns/op 591.20 MB/s 2359208 B/op 49 allocs/op
PASS
ok github.com/valyala/fastjson 57.055s
fastjson
is still slower than original, but now it is really faster than stdjson
, rather than.fake results obtained by parsing the exact same json value again and again.
There is no need to reuse Parser
now since it is always reset before Parse
. There may still room for improvement by reusing Parser
, worth investigating later.
Codecov Report
Merging #25 into master will decrease coverage by
0.23%
. The diff coverage is94.87%
.
@@ Coverage Diff @@
## master #25 +/- ##
==========================================
- Coverage 93.11% 92.88% -0.24%
==========================================
Files 9 9
Lines 1046 1124 +78
==========================================
+ Hits 974 1044 +70
- Misses 49 55 +6
- Partials 23 25 +2
Impacted Files | Coverage Δ | |
---|---|---|
scanner.go | 100% <100%> (ø) |
:arrow_up: |
arena.go | 100% <100%> (ø) |
:arrow_up: |
update.go | 78.87% <82.35%> (+0.69%) |
:arrow_up: |
parser.go | 90.36% <96.9%> (+0.19%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update cbbc967...651c591. Read the comment docs.
After resuing the parser, here are the new benchmark results:
goos: linux
goarch: amd64
pkg: github.com/valyala/fastjson
BenchmarkParse/small/stdjson-map-4 300000 4660 ns/op 40.77 MB/s 960 B/op 51 allocs/op
BenchmarkParse/small/stdjson-struct-4 1000000 2362 ns/op 80.43 MB/s 224 B/op 4 allocs/op
BenchmarkParse/small/stdjson-empty-struct-4 1000000 1442 ns/op 131.69 MB/s 168 B/op 2 allocs/op
BenchmarkParse/small/fastjson-4 5000000 356 ns/op 532.46 MB/s 192 B/op 1 allocs/op
BenchmarkParse/small/fastjson-get-4 3000000 496 ns/op 382.78 MB/s 192 B/op 1 allocs/op
BenchmarkParse/medium/stdjson-map-4 50000 25757 ns/op 90.42 MB/s 10195 B/op 208 allocs/op
BenchmarkParse/medium/stdjson-struct-4 50000 30011 ns/op 77.60 MB/s 9174 B/op 258 allocs/op
BenchmarkParse/medium/stdjson-empty-struct-4 100000 11133 ns/op 209.18 MB/s 280 B/op 5 allocs/op
BenchmarkParse/medium/fastjson-4 500000 3786 ns/op 615.14 MB/s 2688 B/op 1 allocs/op
BenchmarkParse/medium/fastjson-get-4 300000 3373 ns/op 690.34 MB/s 2688 B/op 1 allocs/op
BenchmarkParse/large/stdjson-map-4 3000 375616 ns/op 74.86 MB/s 210749 B/op 2785 allocs/op
BenchmarkParse/large/stdjson-struct-4 10000 160624 ns/op 175.05 MB/s 15617 B/op 353 allocs/op
BenchmarkParse/large/stdjson-empty-struct-4 10000 136074 ns/op 206.64 MB/s 280 B/op 5 allocs/op
BenchmarkParse/large/fastjson-4 30000 38770 ns/op 725.24 MB/s 28707 B/op 1 allocs/op
BenchmarkParse/large/fastjson-get-4 30000 50005 ns/op 562.30 MB/s 28707 B/op 1 allocs/op
BenchmarkParse/canada/stdjson-map-4 20 56081600 ns/op 40.14 MB/s 12260568 B/op 392540 allocs/op
BenchmarkParse/canada/stdjson-struct-4 30 42409066 ns/op 53.08 MB/s 12260171 B/op 392534 allocs/op
BenchmarkParse/canada/stdjson-empty-struct-4 200 10085860 ns/op 223.19 MB/s 281 B/op 5 allocs/op
BenchmarkParse/canada/fastjson-4 200 7093665 ns/op 317.33 MB/s 3185747 B/op 2 allocs/op
BenchmarkParse/canada/fastjson-get-4 200 6163670 ns/op 365.21 MB/s 3185753 B/op 3 allocs/op
BenchmarkParse/citm/stdjson-map-4 100 18414800 ns/op 93.79 MB/s 5214044 B/op 95402 allocs/op
BenchmarkParse/citm/stdjson-struct-4 200 9872230 ns/op 174.96 MB/s 1994 B/op 75 allocs/op
BenchmarkParse/citm/stdjson-empty-struct-4 200 8464495 ns/op 204.05 MB/s 284 B/op 5 allocs/op
BenchmarkParse/citm/fastjson-4 500 2291844 ns/op 753.63 MB/s 1827241 B/op 1 allocs/op
BenchmarkParse/citm/fastjson-get-4 1000 2131119 ns/op 810.47 MB/s 1777878 B/op 1 allocs/op
BenchmarkParse/twitter/stdjson-map-4 200 6208910 ns/op 101.71 MB/s 2188398 B/op 31267 allocs/op
BenchmarkParse/twitter/stdjson-struct-4 500 3027482 ns/op 208.59 MB/s 409 B/op 6 allocs/op
BenchmarkParse/twitter/stdjson-empty-struct-4 500 2823266 ns/op 223.68 MB/s 408 B/op 6 allocs/op
BenchmarkParse/twitter/fastjson-4 2000 775716 ns/op 814.10 MB/s 645840 B/op 1 allocs/op
BenchmarkParse/twitter/fastjson-get-4 2000 787570 ns/op 801.85 MB/s 645840 B/op 1 allocs/op
PASS
ok github.com/valyala/fastjson 62.245s
Now it's much closer to the original performance. Here is the CPU profile:
File: fastjson.test
Type: cpu
Time: Jan 22, 2019 at 11:11pm (CST)
Duration: 21.35s, Total samples = 52.83s (247.49%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 36.16s, 68.45% of 52.83s total
Dropped 141 nodes (cum <= 0.26s)
Showing top 10 nodes out of 60
flat flat% sum% cum cum%
6.45s 12.21% 12.21% 42.34s 80.14% github.com/valyala/fastjson.(*Parser).parseObject
5.42s 10.26% 22.47% 8.69s 16.45% github.com/valyala/fastjson.skipWS
5.01s 9.48% 31.95% 42.41s 80.28% github.com/valyala/fastjson.(*Parser).parseValue
3.59s 6.80% 38.75% 3.59s 6.80% github.com/valyala/fastjson.parseRawKey
3.36s 6.36% 45.11% 3.36s 6.36% runtime.memmove
3.27s 6.19% 51.30% 3.27s 6.19% github.com/valyala/fastjson.skipWSSlow
2.83s 5.36% 56.65% 3.54s 6.70% runtime.findObject
2.43s 4.60% 61.25% 2.88s 5.45% github.com/valyala/fastjson.(*Parser).getValue
2.01s 3.80% 65.06% 27.18s 51.45% github.com/valyala/fastjson.(*Parser).parseArray
1.79s 3.39% 68.45% 5.55s 10.51% runtime.wbBufFlush1
@valyala can this be merged as well?
This is great idea, since it requires less RAM when parsing JSON structs with non-constant structure, but it slows down parsing a bit. I tried playing with this PR in the mem-optimize2 branch, but it isn't ready for merging into master.
Any update for this PR? I like the concept of this library, but I'm having memory leaks using parserPool, using it without pool is slow.