libplanet
libplanet copied to clipboard
Change `Block<T>`'s `Bencodex` representation
Preamble
The following benchmark is done with some mock data:
| Method | Mean | Error | StdDev | Median |
|---|---|---|---|---|
| Encode | 419.2 ms | 8.38 ms | 17.85 ms | 415.5 ms |
| EncodeSerialize | 220.6 ms | 4.40 ms | 12.11 ms | 217.4 ms |
| DoubleEncode | 286.4 ms | 6.80 ms | 19.94 ms | 282.6 ms |
| DoubleEncodeSerialize | 264.1 ms | 6.08 ms | 17.93 ms | 259.3 ms |
| Decode | 325.5 ms | 7.08 ms | 20.77 ms | 319.1 ms |
| DeserializeDecode | 513.3 ms | 12.95 ms | 38.17 ms | 506.3 ms |
| DoubleDecode | 570.5 ms | 11.39 ms | 17.73 ms | 571.3 ms |
| DeserializeDoubleDecode | 607.7 ms | 10.94 ms | 13.02 ms | 609.7 ms |
For context, I used something akin to the following structure:
class Member
{
public byte[] A { get; }
public byte[] B { get; }
public IValue Encoded => Dictionary.Empty
.Add("A", A)
.Add("B", B);
}
class Owner
{
static private Codec _codec = new Codec();
public Member X { get; }
public Member Y { get; }
public IValue Encoded => Dictionary.Empty
.Add("X", X.Encoded)
.Add("Y", Y.Encoded);
public IValue DoubleEncoded => Dictionary.Empty
.Add("X", _codec.Encode(X.Encoded))
.Add("Y", _codec.Encode(Y.Encoded));
}
Given this, I think it'd be safe to say the expected Bencodex encoded format for a Owner object would be Owner.Encoded which would look like the following:
{
"X": {
"A": 0x1234, // some binary
"B": 0x2345 // some binary
},
"Y": {
"A": 0x3456, // some binary
"B": 0x4567 // some binary
},
}
On the other hand, Owner.DoubleEncoded looks like the following:
{
"X": 0x5678, // binary representation of X
"Y": 0x6789 // binary representation of Y
}
Currently LastCommit and the list of Transaction<T>s for a Block<T> is "doubly encoded" in the sense that they are first encoded into an IValue then serialized to bytes. This has some odd repercussions as seen in the benchmark above. As we often need to convert Bencodex encoded IValues into byte arrays for storing to IStore and/or transmitting through ITransport<T>, I've added additional tests to measure how fast things can be converted to byte arrays. In summary:
Encode: Measures how fastOwner.Encodedcan be computed.EncodeSerialize: Measures how fastCodec.Encode(Owner.Encoded)can be computed.DoubleEncode: Measures how fastOwner.DoubleEncodedcan be computed.DoubleEncodeSerialize: Measures how fastCodec.Encode(Owner.DoubleEncoded)can be computed.Decode,DeserializeDecode,DoubleDeocde,DeserializeDoubleDecode: Reverse of the above four.
Some oddities that stands out:
- For whatever reason,
Encodeis lot slower thanDoubleEncode. This is probably due to some quirks in the inner workings ofBencodex.Net. Possibly using lazy evaluation and/or possibly due to internal immutable data structures used. - Again, for whatever reason,
Encodeis lot slower thanEncodeSerialize. I also have no idea how this can be the case. Probably for the similar reason as above. - Then again,
EncodeSerializeis faster thanDoubleEncodeSerializedespiteEncodebeing slower thanDoubleEncode. - However, for
Decode,DeserializeDecode,DoubleDeocde, andDeserializeDoubleDecode, their speeds are as expected, i.e. proportional to the complexity of operations required for each.
Issue & Suggestion
I would argue that the current implementation of Block<T>'s Bencodex representation uses "double-encoding" scheme and is not particularly intuitive. Additionally, the scheme is not well defined in the sense that whether "double-encoding" scheme should also be used for lower components of a Block<T> (IAction, Transaction<T>, BlockCommit to name the few). And as seen above, there doesn't seem to be a clear performance benefit. Also there is a slight downside of having larger encoded data as a result of "double-encoding". I suggest we move to "single-encoding" scheme if possible. Sadly, this would require another protocol version bump.
As an addendum, the following is when encoded as List instead of Dictionary:
| Method | Mean | Error | StdDev | Median |
|---|---|---|---|---|
| Encode | 244.34 ms | 4.883 ms | 8.023 ms | 242.97 ms |
| EncodeSerialize | 76.34 ms | 1.515 ms | 3.884 ms | 76.28 ms |
| DoubleEncode | 121.65 ms | 2.425 ms | 3.478 ms | 121.20 ms |
| DoubleEncodeSerialize | 105.52 ms | 2.004 ms | 4.878 ms | 105.29 ms |
| Decode | 314.03 ms | 6.201 ms | 14.977 ms | 313.55 ms |
| DeserializeDecode | 371.42 ms | 12.797 ms | 36.093 ms | 367.47 ms |
| DoubleDecode | 396.38 ms | 11.225 ms | 32.920 ms | 384.43 ms |
| DeserializeDoubleDecode | 373.17 ms | 10.222 ms | 29.164 ms | 368.37 ms |
That is, when we have encoded formats
[[0x1234, 0x2345], [0x3456, 0x4567]]
and
[0x5678, 0x6789]
respectively.
Few things to note:
- Overall performance is faster for
Encode,EncodeSerialize,DoubleEncode, andDoubeEncodeSerialize, and the relative relations are the same. - There isn't much performance gain for
Decode, but forDeserializeDecode,DoubleDecode,DeserializeDoubleDecodeare substantially faster.
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.