utreexo icon indicating copy to clipboard operation
utreexo copied to clipboard

Implement/use a byte slice pool

Open kcalvinalvin opened this issue 4 years ago • 2 comments

There are functions in these packages that allocate and de-allocate byte slices numerous times per function call. Using a pool for byte slices help reduce this allocation and speeds up the code.

Also adds helper functions for serializing and de-serializing ints to byte slices for the byte slices returned from the pool as functions such as binary.BigEndian.PutUint64 binary.BigEndian.Uint64 allocate bytes inside the functions.

Also added comments for types/functions that didn't have any

kcalvinalvin avatar Feb 17 '21 17:02 kcalvinalvin

Hey - thanks for working on this but as we talked about on the call, I'm pretty hesitant to use this. It's not actually that complex, but it is kind of weird, and people coming to the code for the first time would be like "why is it doing this weird thing". Or people working on code / modifying it later may not use it as it's not obvious that they need to.

It seems like most of the code that gets sped-up by this is just dumb code that we should actually fix. ParentHash is one that allocates too much - it seems like we could switch to using sha512.New512_256() instead of sha512.Sum512_256 to avoid the append and new allocation there. Like this:

func parentHash(l, r Hash) (h Hash) {
	s52 := sha512.New512_256()
	s52.Write(l[:])
	s52.Write(r[:])
	copy(h[:], s52.Sum(nil))
	return
}

That probably eliminates the extra allocation. Though we can probably do even better...

(I still think some kind of hash worker goroutine that then gets .Reset() to avoid creating / destroying the hasher, which is probably a bigger allocation issue)

The other problems mentioned were SerializeSize() for UData which.. yeah is horrible because it just serializes the whole thing and then checks how big it was. The pool would help with this but really the whole function needs to be fixed.

Similarly leaf hashes don't track their own hash, but they could, just like btcutil txs do.

So it makes sense that a slice pool will help because there's a bunch of dumb / slow code here, but I think the first thing we should do is fix the dumb / slow code, which might end up faster than the pool.

Once we fix these things, if it turns out the pool still makes a significant help, then I'd be OK using one.

If there are other slow / CG or allocation heavy areas, please let me know / write them here and I'll take a look.

adiabat avatar Feb 22 '21 20:02 adiabat

Hm yeah just thinking... even without threads, keeping everything blocking, the ParentHash function could become a method on forest or pollard, and both those structs just have their own hash.Hash which gets used for that method. The creation of a new SHA512_256 every time is probably slower than calling Reset() on one that's persistent.

Also there's all sorts of hardware optimized hashing that I don't think we're using right now.

adiabat avatar Feb 22 '21 20:02 adiabat