ethereumjs-monorepo rlp: Better distinguish and handle null values

Thanks to @gzm55 for this comment originally posted in https://github.com/ethereumjs/rlp/issues/28#issuecomment-796767380

when using rlp, we need two level ser/der protocol:

high level objects <---> rlp objects ( byte string or nest string )
rlp objects <---> byte string

The RLP spec most describe the latter one, leaving the high level protocol to decide how to convert the high level objects to rlp objects. The exception is that how the unsigned integer as a high level object should converte to rlp objects, this is used to encode the list length. According to the implementation, the unsigned integer is convert to byte string by removing all the prefix 0s from the big-endian bytes. So

hl objects	rlp objects	byte string
integer 0	`''` (a zero-length byte string)	0x80 (using rule 2)
integer 1	`'\x01'`	0x01 (using rule 1)
byte list [ ]	`''` (a zero-length byte string)	0x80 (using rule 2)
byte list [ 0x00 ]	`'\x00'`	0x00 (using rule 1)

When doing rlp decoding a 0x80, the output should be in the scope of rlp objects, so the Buffer.from([]) seems reasonable. And the the high level decoding is left to the protocol implementation.

The real problem, imo, is that rlp and all implementations do not distinguish the null and common default values (0, empty list). Out of the ethereum scene, missing null should introduce some minor problems.

In our own rlp implementation, each high level object type is defined as a byte-string type (integer, string, float, etc) or a nest-list type (list, map, struct), and convert to null as following:

hl objects	rlp objects	byte string
null (a byte-string type)	`[]` (an empty nest list)	`0xC0`
null (a nest-list type)	`''` (an empty byte string)	`0x80`

Jan 03 '22 22:01 ryanio

@ryanio the previous comment has an typo:

| byte list [ ] | '' (a zero-length byte string) | 0x00 (using rule 2) |

the last column should be 0x80 (using rule 2) instead of 0x00 (using rule 2)

And, When doing rlp decoding, here should be better specifically written as When doing rlp decoding a 0x80, in this issue.

Jan 04 '22 03:01 gzm55

@gzm55 thanks, have updated

Jan 04 '22 05:01 ryanio

@ryanio according to the current definition, there is still some bytes points are left undefined, which could be used to denote some control purpose especially for null values. These points are:

0x81 followed by [0x00, 0x0f], which should be encoded as a single [0x00, 0x0f]
0xb8 followed by [0x00, 0x37], which should be encoded as [0x80, 0xb7]
[0xb9, 0xbf] followed by 0x00, the high byte of the length should not be zero
0xf8 followed by [0x00, 0x37], which should be encoded as [0xc0, 0xf7]
[0xf9, 0xff] followed by 0x00, the high byte of the length should not be zero

Then we could safely define null encoded as 0xbf 0x00, which is compatible with the previous protocol.

Feb 16 '22 06:02 gzm55

@ryanio why is this issue closed?

Jun 29 '23 00:06 gzm55

@gzm55 out of staleness, if you would like to pursue any changes to the rlp package please open a new issue or PR with your suggestion. Thanks

Jun 29 '23 02:06 ryanio

@ryanio is it ok if I reopen this issue or are you cleaning up and want your issues closed and I should rather open a new one? 🙂

Jun 29 '23 10:06 holgerd77

@holgerd77 if you don't mind opening a new one, that would be great, thank you!

Jun 29 '23 13:06 ryanio

ethereumjs-monorepo ethereumjs-monorepo copied to clipboard

rlp: Better distinguish and handle null values

ethereumjs-monorepo
ethereumjs-monorepo copied to clipboard