ssz
ssz copied to clipboard
Fast conversion from decimal to bytes
Lodestar spends significant CPU time converting uint types from decimal to bytes and back. For example, converting the balance tree to flat and back consist of doing 200_000 ops of those.
SSZ is the one hosting the code that does the conversions. This is one of the implementations that participate in this conversions.
https://github.com/ChainSafe/ssz/blob/be03a3398e9c13b4d6c609ba07dfba7bf2cfe2a6/src/types/basic/uint.ts#L68-L82
https://github.com/ChainSafe/ssz/blob/be03a3398e9c13b4d6c609ba07dfba7bf2cfe2a6/src/types/basic/uint.ts#L84-L98
We should investigate faster alternatives
will investigate :+1:
with https://github.com/ChainSafe/persistent-merkle-tree/pull/52, we can get deserialized number if it contains less than 32 bits (right now we have to convert h0..h7
to Uint8Array and from Uint8Array back to number)
it.only("get deserialized number from h0", () => {
const BeaconState = new ContainerType({
fields: {
slot: number64Type,
}
});
for (let slot = 2 ** 31 - 100; slot <= 2 ** 31; slot++) {
const tb = BeaconState.createTreeBackedFromStruct({slot});
const node = tb.tree.getNode(BigInt(1));
expect(Math.abs(node.h0)).to.be.equal(slot, "failed at slot " + slot);
}
});
it should work with slot, committee index etc. but not with balance
I've tested using TypedArrays to convert numbers and it's actually slower than the current implementation in as-sha256
utils
✓ hashObjectToByteArray 50023 times 407.5760 ops/s 2.453530 ms/op x1.406 4073 runs 10.0 s
✓ byteArrayToHashObject 50023 times 401.1899 ops/s 2.492585 ms/op x1.174 4009 runs 10.0
/**
* Pass 8 numbers in an object and set that to inputArray.
* This function contains multiple same procedures but we intentionally
* do it step by step to improve performance a bit.
**/
export function hashObjectToByteArray(obj, byteArr, offset) {
const uint32 = new Uint32Array(byteArr.buffer);
uint32[0] = obj.h0;
uint32[1] = obj.h1;
uint32[2] = obj.h2;
uint32[3] = obj.h3;
uint32[4] = obj.h4;
uint32[5] = obj.h5;
uint32[6] = obj.h6;
uint32[7] = obj.h7;
}
/**
* Parse outputArray into an object of 8 numbers.
* This is the order that makes Uint32Array the same to Uint8Array
* This function contains multiple same procedures but we intentionally
* do it step by step to improve performance a bit.
**/
export function byteArrayToHashObject(byteArr) {
const uint32 = new Uint32Array(byteArr.buffer);
return {
h0: uint32[0],
h1: uint32[1],
h2: uint32[2],
h3: uint32[3],
h4: uint32[4],
h5: uint32[5],
h6: uint32[6],
h7: uint32[7],
};
}
What makes it so slow is the constructor Uint32Array
. It takes 90% of the CPU time of that function. Then setting or reading numbers is very fast.
@dapplion just to confim, the endianness
in ssz serialization is little
?
@dapplion just to confim, the
endianness
in ssz serialization islittle
?
Yes it's little endian
- as first step, it seems that dataview based to_bytes is at least 2 times faster than the current to_bytes loop for >32 bit value for unit64 and 3.5 times better than <32 bit value for unit64 however when i plugged it in the speedup was just 2.5%-3.5% (another 2.5%-3.5% can potentially come in the reverse leg from bytes to number so may be 5-7% overall with to bytes optimization).
- Will continue work on the other side leg (bytes to number) and corroborate the above hypo of potential 7% speedup.
- while figuring out the optimization and code flow, established other points for potential optimization regarding the use of dataview but will do further code drilldown before establishing the viability of this scenario.
- as discussed before, another potential optimization in making merkelization multicore optimized. will tackle that in separate PR.
@g11tech Thanks! Keep us updated
@dapplion got 88% speedup on normal uint tree backed (will need to check on > 2^32 number increment test but should be 30% plus over there), marginal 4% better performance on the new uint64 tree backed. will continue checking this further a bit more.
before opt
after opt: