jsonlex
jsonlex copied to clipboard
[MODULE] - Fast JSON lexer (tokenizer) with no memory footprint and no garbage collector pressure (zero-alloc). 5x faster compared to Go's default encoding/json tokenizer.
jsonlex
Fast JSON lexer (tokenizer) with no memory footprint and no garbage collector pressure (zero heap allocations).
Installation
go get -u github.com/dtgorski/jsonlex
Important
Using an io.Reader
that directly uses system calls (e.g. os.File
) will result in poor performance. Wrap your input reader with bufio.Reader
or better bytes.Reader
to achieve best results.
Usage A - iterating behaviour (Cursor)
package main
import (
"bytes"
"github.com/dtgorski/jsonlex"
)
func main() {
reader := bytes.NewReader(
[]byte(`{ "foo": "bar", "baz": 42 }`),
)
cursor := jsonlex.NewCursor(reader, nil)
println(cursor.Curr().String())
println(cursor.Next().String())
if !cursor.Next().Is(jsonlex.TokenEOF) {
println("there is more ...")
}
}
Usage B - emitting behaviour (Yield)
package main
import (
"bytes"
"github.com/dtgorski/jsonlex"
)
func main() {
reader := bytes.NewReader(
[]byte(`{ "foo": "bar", "baz": 42 }`),
)
lexer := jsonlex.NewLexer(
func(kind jsonlex.TokenKind, load []byte, pos uint) bool {
save := make([]byte, len(load))
copy(save, load)
println(pos, kind, string(save))
return true
},
)
lexer.Scan(reader)
}
Please note, that the Scan()
function is reentrant and subsequent invocations will continue to consume the available byte stream as long as you provide a reader that implements an UnreadByte() error
interface, and you configure the Lexer with the LexerOptEnableUnreadBuffer
option activated.
Emitted tokens
jsonlex |
Representation |
---|---|
TokenEOF |
signals end of file/stream |
TokenERR |
error string (other than EOF) |
TokenLIT |
literal (true , false , null ) |
TokenNUM |
float number |
TokenSTR |
"...\"..." |
TokenCOL |
: colon |
TokenCOM |
, comma |
TokenLSB |
[ left square bracket |
TokenRSB |
] right square bracket |
TokenLCB |
{ left curly brace |
TokenRCB |
} right curly brace |
Artificial benchmarks
Each benchmark consists of complete tokenization of a JSON document of a given size (2kB, 20kB, 200kB and 2000kB) using one CPU core. The unit doc/s
means tokenized documents per second, so more is better.
The comparison candidate is Go's encoding/json.Decoder.Token() implementation.
2kB | 20kB | 200kb | 2000kB | |
---|---|---|---|---|
encoding/json |
9910 doc/s |
1152 doc/s |
126 doc/s |
14 doc/s |
dtgorski/jsonlex |
71880 doc/s |
7341 doc/s |
753 doc/s |
85 doc/s |
cpus: 1 core (~8000 BogoMIPS)
goos: linux
goarch: amd64
pkg: github.com/dtgorski/jsonlex/bench
Benchmark_encjson_2kB 9910 120475 ns/op 36528 B/op 1963 allocs/op
Benchmark_encjson_20kB 1152 1040771 ns/op 318432 B/op 18231 allocs/op
Benchmark_encjson_200kB 126 9494534 ns/op 2877968 B/op 164401 allocs/op
Benchmark_encjson_2000kB 14 77593586 ns/op 23355856 B/op 1319126 allocs/op
Benchmark_jsonlex_lexer_2kB 71880 16691 ns/op 0 B/op 0 allocs/op
Benchmark_jsonlex_lexer_20kB 7341 163210 ns/op 0 B/op 0 allocs/op
Benchmark_jsonlex_lexer_200kB 753 1594025 ns/op 0 B/op 0 allocs/op
Benchmark_jsonlex_lexer_2000kB 85 14107866 ns/op 0 B/op 0 allocs/op
Benchmark_jsonlex_cursor_2kB 38002 31776 ns/op 3680 B/op 592 allocs/op
Benchmark_jsonlex_cursor_20kB 4058 300490 ns/op 25168 B/op 5446 allocs/op
Benchmark_jsonlex_cursor_200kB 422 2777058 ns/op 248816 B/op 49141 allocs/op
Benchmark_jsonlex_cursor_2000kB 50 23559879 ns/op 2254896 B/op 396298 allocs/op
Disclaimer
The implementation and features of jsonlex
follow the YAGNI principle.
There is no claim for completeness or reliability.
@dev
Try make
:
$ make
make help Displays this list
make clean Removes build/test artifacts
make test Runs integrity test with -race
make bench Executes artificial benchmarks
make prof-cpu Creates CPU profiler output
make prof-mem Creates memory profiler output
make sniff Checks format and runs linter (void on success)
make tidy Formats source files, cleans go.mod
License
MIT - © dtg [at] lengo [dot] org