rjson icon indicating copy to clipboard operation
rjson copied to clipboard

It may be better to review the benchmark data

Open goccy opened this issue 4 years ago • 1 comments

When I read the benchmark code of json-iterator/go in the benchmark of BenchmarkGetRepoValues, I noticed that it uses the feature that can stop the decoding process in the middle. ( https://github.com/WillAbides/rjson/blob/main/benchmarks/jsoniter.go#L100-L102 )

Of course, it is correct to use the specific feature to improve performance, but this optimization is highly dependent on the input data, so for example, if the key name of full_name is the end of the input data, benchmark results so different.

Therefore, when measuring using the best case, it is better to consider the worst case as well, or to use a benchmark that does not depend on the input data.

BEST

If archived , full_name , forks are in the top of input data .

$ go test -bench BenchmarkGetRepoValues/jsoniter
goos: darwin
goarch: amd64
pkg: github.com/willabides/rjson/benchmarks
BenchmarkGetRepoValues/jsoniter-16               5982798               192 ns/op              48 B/op          4 allocs/op
PASS
ok      github.com/willabides/rjson/benchmarks  1.655s

WORST

If full_name is in the end of input data .

$ go test -bench BenchmarkGetRepoValues/jsoniter
goos: darwin
goarch: amd64
pkg: github.com/willabides/rjson/benchmarks
BenchmarkGetRepoValues/jsoniter-16                117582              9576 ns/op            1680 B/op        123 allocs/op
PASS
ok      github.com/willabides/rjson/benchmarks  2.846s

goccy avatar Apr 07 '21 13:04 goccy

I know what you mean, and it's not just jsoniter. I exited early in all the parsers where that was available, so rjson and jsonparser are also exiting early.

I did the test that way because that is a feature I use frequently for work. We need to get just a few items from each document we parse, so we only read far enough into the document to find what we need then discard it.

It is a bit arbitrary that the last element is where it is. Maybe we should add a second version of this benchmark that also gets "watchers_count" at the end of the document.

WillAbides avatar Apr 07 '21 15:04 WillAbides