go-algorand
go-algorand copied to clipboard
perf: allocation overhead PoC
Research and implement some improvements to reduce memory pressure and check against new benchmark (block asm / block validate). Use -membench param to see allocations.
These are some things to try:
- [ ] optimize EvalParams allocations per txn group
- [ ] copying ConsensusParams (pointer)
- [ ] copying Transaction object (pointer)
The task is basically a research to try some dirty hacks and see if this leads to lower GC times.
Additional benchmarking methodology:
- Run a local network with EnableProfiler and EnableDeveloperAPI config options
- Call pprof
curl -o ~/networks/two-fut/$(date +'%Y-%m-%dT%H-%M-%S').heap http://$(cat $NETWORKDIR/Primary/algod.net)/debug/pprof/heap -H "X-Algo-API-Token: $(cat $NETWORKDIR/Primary/algod.admin.token)"
curl --silent -o algod.cpu.pprof http://`cat ${ALGORAND_DATA}/algod.net`/urlAuth/`cat ${ALGORAND_DATA}/algod.admin.token`/debug/pprof/profile?seconds=30
mem pool example: https://github.com/algorand/go-algorand/pull/1880
I would be surprised if there's much to be optimized on EvalParams allocation. I converted to creating a single EvalParams per group a while back. Further, it creates a small, stunted EvalParams if there are no app calls, as that will be good enough for pays, etc.
I don't think we going to "see" any of the copying overhead (of passing Transaction, ConsensusParams, BlockHdr) in allocation or GC costs. It's just an execution tax on all of the code that makes those calls. There's a copy, but no allocation.
For example
//go:noinline
func byRef(a basics.MicroAlgos, txn *Transaction) uint64 {
return a.Raw / txn.Fee.Raw
}
//go:noinline
func byCopy(a basics.MicroAlgos, txn Transaction) uint64 {
return a.Raw / txn.Fee.Raw
}
func BenchmarkCopyTxn(b *testing.B) {
b.ReportAllocs()
var total uint64
p := Transaction{Header: Header{Fee: basics.MicroAlgos{7}}}
algos := basics.MicroAlgos{450_000}
for i := 0; i < b.N; i++ {
total += byCopy(algos, p)
}
}
func BenchmarkRefTxn(b *testing.B) {
b.ReportAllocs()
var total uint64
p := Transaction{Header: Header{Fee: basics.MicroAlgos{7}}}
algos := basics.MicroAlgos{450_000}
for i := 0; i < b.N; i++ {
total += byRef(algos, &p)
}
}
shows no allocations, so presumably no GC, but 15ns overhead for the copying call
BenchmarkCopyTxn-8 66104863 18.06 ns/op 0 B/op 0 allocs/op
BenchmarkRefTxn-8 413728970 2.895 ns/op 0 B/op 0 allocs/op
The issue we do not know if all of these are stack copies or heap escapes. One of the goals is to figure out this as well.