bolts icon indicating copy to clipboard operation
bolts copied to clipboard

Clarify handling of null bytes (U+0000) in UTF-8 text fields across BOLT specifications

Open erickcestari opened this issue 7 months ago • 0 comments

The current BOLT specifications require UTF-8 encoding for text fields (e.g., BOLT11 description field, BOLT12 offer descriptions) but don't explicitly address the handling of null bytes (U+0000). While null bytes are technically valid UTF-8, they cause interoperability and might security issues across Lightning implementations.

Current Specification Language

  • BOLT11: "MUST set d to a valid UTF-8 string" (BOLT11 spec)
  • BOLT01: "A writer MUST ensure an array of these is a valid UTF-8 string, a reader MAY reject any messages containing an array of these which is not a valid UTF-8 string" (BOLT01 spec)

The Problem

Null bytes (U+0000) are valid Unicode code points and valid UTF-8, but they cause implementation issues:

  1. C/C++ implementations: Treat null bytes as string terminators, causing truncation
  2. Inconsistent behavior: Different implementations handle them differently (truncate, reject, or pass through)

For example, this offer and BOLT11 invoice cannot be decoded in CLN, but rust-lightning can handle them.

Test Vectors:

BOLT11:

lnbc100n1p70xwfzpp5qqqsyqcyq5rqwzqfqqqsyqcyq5rqwzqfqqqsyqcyq5rqwzqfqypqdrv2pkx2ctnv5sxxmmwwd5kgetjypeh2ursdae8g6twvus8g6rfwvs8qun0dfjkxaqqqpmkjargyph82mrvyp38jar9wvqx2mtzv4jxgetyqqqqnp4q0n326hr8v9zprg8gsvezcch06gfaqqhde2aj730yg0durunfhv66sp5qszsvpcgpyqsyps8pqysqqgzqvyqjqqpqgpsgpgqqypqxpq9qcrs9qrsgq2srkxv0a8uu02qvtcvlt5ex354axardkn8z0t59twhsk7qn660gqw0l8ygtfvpdnt8u892qhmp85eueccvnmxm7frkk9mzscfajvgfqq00jpr3

Description: Please consider supporting this project\x00\x00with null bytes\x00embedded\x00\x00

BOLT12 offer:

lno1pgx9getnwsq8vetrw3hhyucs5ypjgef743p5fzqq9nqxh0ah7y87rzv3ud0eleps9kl2d5348hq2k8qzqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgqpqqqqqqqqqqqqqqqqqqqqqqqqqqqzqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqyqszqgpqqzq3zyg3zyg3zygs

Description: Test\x00vectors

Proposed Solution

Explicitly Prohibit Null Bytes Text fields MUST contain valid UTF-8 and MUST NOT contain null bytes (U+0000). Implementations MUST reject messages containing null bytes in text fields.

erickcestari avatar May 23 '25 16:05 erickcestari