btcd
btcd copied to clipboard
wire: only borrow/return binaryFreeList buffers at the message level
This PR optimizes the wire
packages serialization of small buffers by minimizing the number of borrow/return round trips during message serialization. Currently the wire
package uses a binaryFreeList
from which 8-byte buffers are borrowed and returned for the purpose of serializing small integers and varints.
Problem
To understand the problem, consider calling WriteVarInt
on a number greater than 0xfc
(which requires writing the discriminant and a 2, 4, or 8 byte value following).
For instance, writing 20,000 will invoke PutUint8
and then PutUint16
. Expanding this out to examine the message passing, we see:
buffer <- freelist
PutUInt8(buffer, discriminant)
w.Write(buffer)
freelist <- buffer
buffer <- freelist
PutUint16(buffer, value)
w.Write(buffer)
freelist <- buffer
Each <-
requires a channel select
, which more-or-less bears the performance implication of a mutex. This cost, in addition to need to wake up other goroutines and switch executions, imparts a significant performance penalty. In the context of block serialization, several hundred thousand of these operations may be performed.
Solution
In our example above, we can improve this by only using two <-
, one to borrow and one to return, as so:
buffer <- freelist
PutUInt8(buffer, discriminant)
w.Write(buffer)
PutUint16(buffer, value)
w.Write(buffer)
freelist <- buffer
As expected, cutting the number channels sends in half cuts also cuts the latency in half, which can be seen in the benchmarks below for larger VarInt
s,
The remainder of this PR is to propagate this pattern all the way up to the top level of messages in the wire package, such that deserializing a message only incurs one borrow and one return. Any subroutines are made to conditionally borrow from the binarySerializer
if the invoker has not provided them with a buffer, and conditionally return if they indeed were required to borrow.
A good example of how these channel sends/receives can add up is in MsgTx serialization, which is now upwards of 80% faster as a result of these optimizations:
benchmark old ns/op new ns/op delta
BenchmarkSerializeTx-8 683 142 -79.21%
BenchmarkSerializeTxSmall-8 724 143 -80.25%
BenchmarkSerializeTxLarge-8 1476002 182111 -87.66%
Preliminary Benchmarks
benchmark old ns/op new ns/op delta
BenchmarkWriteVarInt1-8 71.9 74.6 +3.76%
BenchmarkWriteVarInt3-8 148 70.9 -52.09%
BenchmarkWriteVarInt5-8 149 71.0 -52.35%
BenchmarkWriteVarInt9-8 147 72.8 -50.48%
BenchmarkReadVarInt1-8 78.8 77.9 -1.14%
BenchmarkReadVarInt3-8 159 87.7 -44.84%
BenchmarkReadVarInt5-8 155 88.3 -43.03%
BenchmarkReadVarInt9-8 158 86.9 -45.00%
BenchmarkReadVarStr4-8 120 119 -0.83%
BenchmarkReadVarStr10-8 138 130 -5.80%
BenchmarkWriteVarStr4-8 101 105 +3.96%
BenchmarkWriteVarStr10-8 103 105 +1.94%
BenchmarkReadOutPoint-8 91.3 28.7 -68.57%
BenchmarkWriteOutPoint-8 78.9 10.2 -87.07%
BenchmarkReadTxOut-8 245 118 -51.84%
BenchmarkWriteTxOut-8 151 93.5 -38.08%
BenchmarkReadTxIn-8 338 139 -58.88%
BenchmarkWriteTxIn-8 238 31.0 -86.97%
BenchmarkDeserializeTxSmall-8 1119 586 -47.63%
BenchmarkDeserializeTxLarge-8 2476063 1275815 -48.47%
BenchmarkSerializeTx-8 683 142 -79.21%
BenchmarkSerializeTxSmall-8 724 143 -80.25%
BenchmarkSerializeTxLarge-8 1476002 182111 -87.66%
BenchmarkReadBlockHeader-8 406 69.1 -82.98%
BenchmarkWriteBlockHeader-8 431 21.1 -95.10%
BenchmarkDecodeGetHeaders-8 13537 12238 -9.60%
BenchmarkDecodeHeaders-8 1025275 236709 -76.91%
BenchmarkDecodeGetBlocks-8 13206 11684 -11.53%
BenchmarkDecodeAddr-8 337977 157519 -53.39%
BenchmarkDecodeInv-8 5990935 1898169 -68.32%
BenchmarkDecodeNotFound-8 6285831 1864701 -70.33%
BenchmarkDecodeMerkleBlock-8 4357 2606 -40.19%
BenchmarkTxHash-8 1928 1222 -36.62%
BenchmarkDoubleHashB-8 993 969 -2.42%
BenchmarkDoubleHashH-8 932 1029 +10.41%
benchmark old allocs new allocs delta
BenchmarkWriteBlockHeader-8 4 0 -100.00%
// all others remain 0 alloc
benchmark old bytes new bytes delta
BenchmarkWriteBlockHeader-8 16 0 -100.00%
// all others remain unchanged
Notes
I'm still in the process of going through and adding benchmarks to top-level messages in order to guage the overall performance benefit, expect more to be added at a later point.
There are a few remaining messages which have not yet been optimized, e.g. MsgAlert
, MsgVesion
, etc. I plan to add those as well but decided to start with the ones that were more performance critical.
@jcvernaleo (as per #1530)
- low priority
- enhancement
By the way @cfromknecht this is awesome!
FWIW I rebased this PR (locally) over #1684 and reran the CalcSighHash
benchmarks.
Since calculating the legacy sighash involves serializing a copy of the original transaction,
optimizing our wire encoding speeds up the sighash calculation speed significantly.
This is a comparison of CalcSigHash before and after applying the wire optimizations to the tokenizer branch:
benchmark old ns/op new ns/op delta
BenchmarkCalcSigHash-8 3619477 1758789 -51.41%
benchmark old allocs new allocs delta
BenchmarkCalcSigHash-8 801 712 -11.11%
benchmark old bytes new bytes delta
BenchmarkCalcSigHash-8 1293354 1290504 -0.22%
Using an 80% transaction serialization speedup to ballpark, this indicates roughly 2/3 of
the latency incurred byCalcSigHash
is transaction serialization, and with this change it
becomes 1/4.
Pull Request Test Coverage Report for Build 541264840
- 572 of 614 (93.16%) changed or added relevant lines in 22 files are covered.
- 45 unchanged lines in 4 files lost coverage.
- Overall coverage increased (+0.3%) to 53.726%
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
---|---|---|---|
wire/msggetcfcheckpt.go | 12 | 14 | 85.71% |
wire/msgcfilter.go | 14 | 18 | 77.78% |
wire/msggetcfheaders.go | 16 | 20 | 80.0% |
wire/msggetcfilters.go | 16 | 20 | 80.0% |
wire/msgtx.go | 94 | 99 | 94.95% |
wire/msgcfcheckpt.go | 14 | 22 | 63.64% |
wire/msgcfheaders.go | 18 | 33 | 54.55% |
<!-- | Total: | 572 | 614 |
Files with Coverage Reduction | New Missed Lines | % |
---|---|---|
wire/msgtx.go | 1 | 94.87% |
peer/peer.go | 2 | 76.18% |
btcec/signature.go | 6 | 91.64% |
wire/common.go | 36 | 91.91% |
<!-- | Total: | 45 |
Totals | |
---|---|
Change from base Build 535940167: | 0.3% |
Covered Lines: | 21180 |
Relevant Lines: | 39422 |
💛 - Coveralls
@cfromknecht Could you please rebase this PR, since https://github.com/btcsuite/btcd/pull/1769 landed recently?