indexer icon indicating copy to clipboard operation
indexer copied to clipboard

[GC2023] Simple fix for allocation overhead

Open jamesrom opened this issue 2 years ago • 4 comments

A type, Trigram is defined as 3 runes. This saves time spent in the expensive allocation. Benchmarks show a 2.6x improvement with this simple fix alone.

Before:

goos: darwin
goarch: arm64
pkg: indexer
BenchmarkTrigrams-10             3804986               310.9 ns/op
BenchmarkTokenize-10              104250             11147 ns/op
PASS
ok      indexer 2.885s

After:

goos: darwin
goarch: arm64
pkg: indexer
BenchmarkTrigrams-10            22071332                48.99 ns/op
BenchmarkTokenize-10              273313              4267 ns/op
PASS
ok      indexer 2.438s

(Thanks to #2 for the BenchmarkTokenize test)

jamesrom avatar Nov 10 '23 05:11 jamesrom

Sweet. Im going to do a benchmark of all of the PR's against my best effort and turn it into a blog post, and then merge the winner.

boyter avatar Nov 12 '23 21:11 boyter

Nice, I look forward to reading the post. This fix is simple enough that it should work with the other fixes that have been proposed too.

By the way, I ran the benchmarks again with -benchmem and you can see a 13x improvement on allocations in Trigrams which leads to a 4.3x improvement in Tokenize:

Before:

goos: darwin
goarch: arm64
pkg: indexer
BenchmarkTrigrams-10             3684786               311.7 ns/op           288 B/op         13 allocs/op
BenchmarkTokenize-10              105349             11190 ns/op           21608 B/op        312 allocs/op
PASS
ok      indexer 3.181s

After:

goos: darwin
goarch: arm64
pkg: indexer
BenchmarkTrigrams-10            24047827                49.17 ns/op          144 B/op          1 allocs/op
BenchmarkTokenize-10              278169              4193 ns/op           12904 B/op         72 allocs/op
PASS
ok      indexer 3.872s

jamesrom avatar Nov 13 '23 03:11 jamesrom

The only catch I see is that you arent calling .Bytes on the function in your benchmark, which of course means the output is different.

Was your intention to just have that evaluated later in the Itemise function when it is hashing?

For the moment I added a quick change to master where it includes a copy of this so you can compare this against a few other methods,

hyperfine 'indexer --trigram=default' 'indexer --trigram=jamesrom'

As you can see its not any faster at the moment because its calling the .Bytes inside it.

boyter avatar Nov 13 '23 06:11 boyter

Yeah, the trick here is to make each Trigram point to the memory where the runes already live, avoiding extra allocation. The main idea being that you can pass around Trigrams not strings, as you don't need to worry about string encoding if you're just hashing the bytes. Then you can do stuff like what I've done in bc2ecbf13b9e88ce1bd703d2afa6bd4d3f357a1e (BytesFast is 25x faster than Bytes).

I've added more complete test BenchmarkItemise to show how this works.

Before:

goos: darwin
goarch: arm64
pkg: indexer
BenchmarkTrigrams-10                6596            180396 ns/op          226217 B/op       7799 allocs/op
BenchmarkItemise-10                 2503            466337 ns/op          617474 B/op      15979 allocs/op
PASS
ok      indexer 3.797s

After:

goos: darwin
goarch: arm64
pkg: indexer
BenchmarkTrigrams-10                       49664             23261 ns/op          131072 B/op          2 allocs/op
BenchmarkTrigramBytes-10                74693443                15.84 ns/op            0 B/op          0 allocs/op
BenchmarkTrigramBytesFast-10            1000000000               0.6242 ns/op          0 B/op          0 allocs/op
BenchmarkItemise-10                         3018            383156 ns/op          510642 B/op      12348 allocs/op
PASS
ok      indexer 4.896s

1.22x improvement in Itemise ns/op 1.29x improvement in Itemise allocs/op

jamesrom avatar Nov 14 '23 05:11 jamesrom