ethereumjs-monorepo icon indicating copy to clipboard operation
ethereumjs-monorepo copied to clipboard

rlp: Better distinguish and handle null values

Open ryanio opened this issue 3 years ago • 3 comments

Thanks to @gzm55 for this comment originally posted in https://github.com/ethereumjs/rlp/issues/28#issuecomment-796767380

when using rlp, we need two level ser/der protocol:

  • high level objects <---> rlp objects ( byte string or nest string )
  • rlp objects <---> byte string

The RLP spec most describe the latter one, leaving the high level protocol to decide how to convert the high level objects to rlp objects. The exception is that how the unsigned integer as a high level object should converte to rlp objects, this is used to encode the list length. According to the implementation, the unsigned integer is convert to byte string by removing all the prefix 0s from the big-endian bytes. So

hl objects rlp objects byte string
integer 0 '' (a zero-length byte string) 0x80 (using rule 2)
integer 1 '\x01' 0x01 (using rule 1)
byte list [ ] '' (a zero-length byte string) 0x80 (using rule 2)
byte list [ 0x00 ] '\x00' 0x00 (using rule 1)

When doing rlp decoding a 0x80, the output should be in the scope of rlp objects, so the Buffer.from([]) seems reasonable. And the the high level decoding is left to the protocol implementation.

The real problem, imo, is that rlp and all implementations do not distinguish the null and common default values (0, empty list). Out of the ethereum scene, missing null should introduce some minor problems.

In our own rlp implementation, each high level object type is defined as a byte-string type (integer, string, float, etc) or a nest-list type (list, map, struct), and convert to null as following:

hl objects rlp objects byte string
null (a byte-string type) [] (an empty nest list) 0xC0
null (a nest-list type) '' (an empty byte string) 0x80

ryanio avatar Jan 03 '22 22:01 ryanio

@ryanio the previous comment has an typo:

| byte list [ ] | '' (a zero-length byte string) | 0x00 (using rule 2) |

the last column should be 0x80 (using rule 2) instead of 0x00 (using rule 2)

And, When doing rlp decoding, here should be better specifically written as When doing rlp decoding a 0x80, in this issue.

gzm55 avatar Jan 04 '22 03:01 gzm55

@gzm55 thanks, have updated

ryanio avatar Jan 04 '22 05:01 ryanio

@ryanio according to the current definition, there is still some bytes points are left undefined, which could be used to denote some control purpose especially for null values. These points are:

  • 0x81 followed by [0x00, 0x0f], which should be encoded as a single [0x00, 0x0f]
  • 0xb8 followed by [0x00, 0x37], which should be encoded as [0x80, 0xb7]
  • [0xb9, 0xbf] followed by 0x00, the high byte of the length should not be zero
  • 0xf8 followed by [0x00, 0x37], which should be encoded as [0xc0, 0xf7]
  • [0xf9, 0xff] followed by 0x00, the high byte of the length should not be zero

Then we could safely define null encoded as 0xbf 0x00, which is compatible with the previous protocol.

gzm55 avatar Feb 16 '22 06:02 gzm55

@ryanio why is this issue closed?

gzm55 avatar Jun 29 '23 00:06 gzm55

@gzm55 out of staleness, if you would like to pursue any changes to the rlp package please open a new issue or PR with your suggestion. Thanks

ryanio avatar Jun 29 '23 02:06 ryanio

@ryanio is it ok if I reopen this issue or are you cleaning up and want your issues closed and I should rather open a new one? 🙂

holgerd77 avatar Jun 29 '23 10:06 holgerd77

@holgerd77 if you don't mind opening a new one, that would be great, thank you!

ryanio avatar Jun 29 '23 13:06 ryanio