ethereumjs-monorepo
ethereumjs-monorepo copied to clipboard
rlp: Better distinguish and handle null values
Thanks to @gzm55 for this comment originally posted in https://github.com/ethereumjs/rlp/issues/28#issuecomment-796767380
when using rlp, we need two level ser/der protocol:
- high level objects <---> rlp objects ( byte string or nest string )
- rlp objects <---> byte string
The RLP spec most describe the latter one, leaving the high level protocol to decide how to convert the high level objects to rlp objects. The exception is that how the unsigned integer as a high level object should converte to rlp objects, this is used to encode the list length. According to the implementation, the unsigned integer is convert to byte string by removing all the prefix 0s from the big-endian bytes. So
| hl objects | rlp objects | byte string |
|---|---|---|
| integer 0 | '' (a zero-length byte string) |
0x80 (using rule 2) |
| integer 1 | '\x01' |
0x01 (using rule 1) |
| byte list [ ] | '' (a zero-length byte string) |
0x80 (using rule 2) |
| byte list [ 0x00 ] | '\x00' |
0x00 (using rule 1) |
When doing rlp decoding a 0x80, the output should be in the scope of rlp objects, so the Buffer.from([]) seems reasonable. And the the high level decoding is left to the protocol implementation.
The real problem, imo, is that rlp and all implementations do not distinguish the null and common default values (0, empty list). Out of the ethereum scene, missing null should introduce some minor problems.
In our own rlp implementation, each high level object type is defined as a byte-string type (integer, string, float, etc) or a nest-list type (list, map, struct), and convert to null as following:
| hl objects | rlp objects | byte string |
|---|---|---|
| null (a byte-string type) | [] (an empty nest list) |
0xC0 |
| null (a nest-list type) | '' (an empty byte string) |
0x80 |
@ryanio the previous comment has an typo:
| byte list [ ] | '' (a zero-length byte string) | 0x00 (using rule 2) |
the last column should be 0x80 (using rule 2) instead of 0x00 (using rule 2)
And, When doing rlp decoding, here should be better specifically written as When doing rlp decoding a 0x80, in this issue.
@gzm55 thanks, have updated
@ryanio according to the current definition, there is still some bytes points are left undefined, which could be used to denote some control purpose especially for null values. These points are:
0x81followed by[0x00, 0x0f], which should be encoded as a single[0x00, 0x0f]0xb8followed by[0x00, 0x37], which should be encoded as[0x80, 0xb7][0xb9, 0xbf]followed by0x00, the high byte of the length should not be zero0xf8followed by[0x00, 0x37], which should be encoded as[0xc0, 0xf7][0xf9, 0xff]followed by0x00, the high byte of the length should not be zero
Then we could safely define null encoded as 0xbf 0x00, which is compatible with the previous protocol.
@ryanio why is this issue closed?
@gzm55 out of staleness, if you would like to pursue any changes to the rlp package please open a new issue or PR with your suggestion. Thanks
@ryanio is it ok if I reopen this issue or are you cleaning up and want your issues closed and I should rather open a new one? 🙂
@holgerd77 if you don't mind opening a new one, that would be great, thank you!