fpc Enable reader/writer re-use, reduce allocations, fix benchmarks

Enable reader/writer re-use, reduce allocations, fix benchmarks

Open danclarke opened this issue 1 year ago • 0 comments

Overview

This is a very cool project! While integrating the library into a project, I found a few places where performance could be improved relatively easily. My specific use case involved writing many sequences separately, requiring many new writers. This caused a lot of overhead, which I’ve now mostly removed.

While working with the benchmark suite I found the encoding benchmarks were misleading. If the number of iterations of the benchmark exceeded the number of values in the slice, the encoder would essentially be a ‘no-op’ since there was no more data. This dramatically increased the apparent throughput. In addition, byte buffers were re-used for each iteration, this isn’t possible. The ‘Reset’ method clears the buffer, again resulting in a no-op read, leading to inflated numbers. I’ve fixed these benchmarks and added more for my specific use case.

Benchmarks

A summary of notable benchmark changes is below with a 13900K, after the benchmark fixes:

Benchmark	Original MB/sec	New MB/sec
BenchmarkBlockEncode	650.27MB/s	1,000.78MB/s
BenchmarkLeadingZeroBytes	10,610.15MB/s	83,906.17MB/s
BenchmarkWriter	6,490.96MB/s	7,141.55MB/s
BenchmarkReadFloats	266.74MB/s	320.11MB/s
BenchmarkReadFloatsRepeated (new)	264.93MB/s	320.78MB/s
BenchmarkReadFloatsRepeatedReuse (new)		482.36MB/s
BenchmarkLargeEncode (new)	650.72MB/s	744.95MB/s
BenchmarkLargeEncodeReuse (new)		815.66MB/s

The BenchmarkWriter performance is so high I’m suspicious… Not listed are reduced allocations and RAM usage which have both dropped significantly.

Summary of changes

clz replaced with native Go method that uses native CPU instruction if available
All major structs updated to re-use buffers where possible
Almost zero allocation once 'warmed up'
Clear / Reset methods added where possible to allow re-use of structs
readFull change by @kamstrup integrated
Updated Go version so we have access to clear()

I’ve not run a profiler or gone to extreme lengths to optimise so I’m sure there’s more on the table.

Oct 02 '23 21:10 danclarke

fpc fpc copied to clipboard

Enable reader/writer re-use, reduce allocations, fix benchmarks

Overview

Benchmarks

Summary of changes

fpc
fpc copied to clipboard