libplanet icon indicating copy to clipboard operation
libplanet copied to clipboard

Change `Block<T>`'s `Bencodex` representation

Open greymistcube opened this issue 2 years ago • 2 comments

Preamble

The following benchmark is done with some mock data:

Method Mean Error StdDev Median
Encode 419.2 ms 8.38 ms 17.85 ms 415.5 ms
EncodeSerialize 220.6 ms 4.40 ms 12.11 ms 217.4 ms
DoubleEncode 286.4 ms 6.80 ms 19.94 ms 282.6 ms
DoubleEncodeSerialize 264.1 ms 6.08 ms 17.93 ms 259.3 ms
Decode 325.5 ms 7.08 ms 20.77 ms 319.1 ms
DeserializeDecode 513.3 ms 12.95 ms 38.17 ms 506.3 ms
DoubleDecode 570.5 ms 11.39 ms 17.73 ms 571.3 ms
DeserializeDoubleDecode 607.7 ms 10.94 ms 13.02 ms 609.7 ms

For context, I used something akin to the following structure:

class Member
{
    public byte[] A { get; }
    public byte[] B { get; }
    public IValue Encoded => Dictionary.Empty
        .Add("A", A)
        .Add("B", B);
}

class Owner
{
    static private Codec _codec = new Codec();
    public Member X { get; }
    public Member Y { get; }
    public IValue Encoded => Dictionary.Empty
        .Add("X", X.Encoded)
        .Add("Y", Y.Encoded);
    public IValue DoubleEncoded => Dictionary.Empty
        .Add("X", _codec.Encode(X.Encoded))
        .Add("Y", _codec.Encode(Y.Encoded));
}

Given this, I think it'd be safe to say the expected Bencodex encoded format for a Owner object would be Owner.Encoded which would look like the following:

{
  "X": {
    "A": 0x1234,  // some binary
    "B": 0x2345   // some binary
  },
  "Y": {
    "A": 0x3456,  // some binary
    "B": 0x4567   // some binary
  },
}

On the other hand, Owner.DoubleEncoded looks like the following:

{
  "X": 0x5678,    // binary representation of X
  "Y": 0x6789     // binary representation of Y
}

Currently LastCommit and the list of Transaction<T>s for a Block<T> is "doubly encoded" in the sense that they are first encoded into an IValue then serialized to bytes. This has some odd repercussions as seen in the benchmark above. As we often need to convert Bencodex encoded IValues into byte arrays for storing to IStore and/or transmitting through ITransport<T>, I've added additional tests to measure how fast things can be converted to byte arrays. In summary:

  • Encode: Measures how fast Owner.Encoded can be computed.
  • EncodeSerialize: Measures how fast Codec.Encode(Owner.Encoded) can be computed.
  • DoubleEncode: Measures how fast Owner.DoubleEncoded can be computed.
  • DoubleEncodeSerialize: Measures how fast Codec.Encode(Owner.DoubleEncoded) can be computed.
  • Decode, DeserializeDecode, DoubleDeocde, DeserializeDoubleDecode: Reverse of the above four.

Some oddities that stands out:

  • For whatever reason, Encode is lot slower than DoubleEncode. This is probably due to some quirks in the inner workings of Bencodex.Net. Possibly using lazy evaluation and/or possibly due to internal immutable data structures used.
  • Again, for whatever reason, Encode is lot slower than EncodeSerialize. I also have no idea how this can be the case. Probably for the similar reason as above.
  • Then again, EncodeSerialize is faster than DoubleEncodeSerialize despite Encode being slower than DoubleEncode.
  • However, for Decode, DeserializeDecode, DoubleDeocde, and DeserializeDoubleDecode, their speeds are as expected, i.e. proportional to the complexity of operations required for each.

Issue & Suggestion

I would argue that the current implementation of Block<T>'s Bencodex representation uses "double-encoding" scheme and is not particularly intuitive. Additionally, the scheme is not well defined in the sense that whether "double-encoding" scheme should also be used for lower components of a Block<T> (IAction, Transaction<T>, BlockCommit to name the few). And as seen above, there doesn't seem to be a clear performance benefit. Also there is a slight downside of having larger encoded data as a result of "double-encoding". I suggest we move to "single-encoding" scheme if possible. Sadly, this would require another protocol version bump.

greymistcube avatar Mar 19 '23 14:03 greymistcube

As an addendum, the following is when encoded as List instead of Dictionary:

Method Mean Error StdDev Median
Encode 244.34 ms 4.883 ms 8.023 ms 242.97 ms
EncodeSerialize 76.34 ms 1.515 ms 3.884 ms 76.28 ms
DoubleEncode 121.65 ms 2.425 ms 3.478 ms 121.20 ms
DoubleEncodeSerialize 105.52 ms 2.004 ms 4.878 ms 105.29 ms
Decode 314.03 ms 6.201 ms 14.977 ms 313.55 ms
DeserializeDecode 371.42 ms 12.797 ms 36.093 ms 367.47 ms
DoubleDecode 396.38 ms 11.225 ms 32.920 ms 384.43 ms
DeserializeDoubleDecode 373.17 ms 10.222 ms 29.164 ms 368.37 ms

That is, when we have encoded formats

[[0x1234, 0x2345], [0x3456, 0x4567]]

and

[0x5678, 0x6789]

respectively.

Few things to note:

  • Overall performance is faster for Encode, EncodeSerialize, DoubleEncode, and DoubeEncodeSerialize, and the relative relations are the same.
  • There isn't much performance gain for Decode, but for DeserializeDecode, DoubleDecode, DeserializeDoubleDecode are substantially faster.

greymistcube avatar Mar 20 '23 03:03 greymistcube

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

stale[bot] avatar May 22 '23 00:05 stale[bot]