bolts icon indicating copy to clipboard operation
bolts copied to clipboard

Fix currency UTF-8 test vector to actually test invalid UTF-8 sequences

Open erickcestari opened this issue 7 months ago • 6 comments

The previous test vector for "Malformed: invalid currency UTF-8" was incorrectly testing invalid length encoding (02 length with 2804 value) rather than testing invalid UTF-8 byte sequences in the currency field itself.

This commit replaces it with a test vector that:

  • Uses correct length encoding (03 length with proper 3-byte value)
  • Contains an actual invalid UTF-8 sequence in the currency data
  • Properly validates that parsers reject malformed UTF-8 in currency fields

erickcestari avatar May 30 '25 17:05 erickcestari

@thomash-acinq can you review this and check what eclair does?

t-bast avatar Jun 02 '25 09:06 t-bast

The current test vector uses the following TLV stream:

06 02 8041  // Type: 6 (offer_currency), Length: 2, Value: two bytes that form an invalid utf8 string
0a 05 414c494345  // Type: 10 (offer_description), Length: 5, Value: "ALICE"
16 21 020202020202020202020202020202020202020202020202020202020202020202  // Type: 22 (offer_issuer_id), Length: 33, Value : some public key

It does what it claims to do (test that the offer_currency is validated as a utf8 string) and I don't see any need to change it.

Your proposed change is

06 03 4d2aa4  // Type: 6 (offer_currency), Length: 3, Value: three bytes that form an invalid utf8 string
08 02 2710  // Type: 8 (offer_amount)
0a 0c 5465737420766563746f7273 // Type: 10 (offer_description)
16 21 02eec7245d6b7d2ccb30380bfbe2a3648cd7a942653f5aa340edcea1f283686619  // Type: 22 (offer_issuer_id)

which does not seem to be an improvement.

thomash-acinq avatar Jun 02 '25 12:06 thomash-acinq

The current test vector uses the following TLV stream:

06 02 8041  // Type: 6 (offer_currency), Length: 2, Value: two bytes that form an invalid utf8 string
0a 05 414c494345  // Type: 10 (offer_description), Length: 5, Value: "ALICE"
16 21 020202020202020202020202020202020202020202020202020202020202020202  // Type: 22 (offer_issuer_id), Length: 33, Value : some public key

It does what it claims to do (test that the offer_currency is validated as a utf8 string) and I don't see any need to change it.

Your proposed change is

06 03 4d2aa4  // Type: 6 (offer_currency), Length: 3, Value: three bytes that form an invalid utf8 string
08 02 2710  // Type: 8 (offer_amount)
0a 0c 5465737420766563746f7273 // Type: 10 (offer_description)
16 21 02eec7245d6b7d2ccb30380bfbe2a3648cd7a942653f5aa340edcea1f283686619  // Type: 22 (offer_issuer_id)

which does not seem to be an improvement.

Some applications (rust-lightning) validate currency codes by first checking for exactly 3 bytes before parsing the UTF-8 content. Since this test vector isn't 3 bytes long, those applications would reject it based on length rather than detecting the invalid UTF-8 encoding. Using a 3-byte invalid UTF-8 sequence would better test the UTF-8 validation logic specifically. I'll also remove the other fields to keep this test vector focused solely on offer_currency.

erickcestari avatar Jun 02 '25 12:06 erickcestari

OK, I understand the problem now. But if we really want to check that this field is a valid ISO 4217 code we should also add test vectors with symbols (US$), lowercase letters (usd) or even codes that look good but are not part of the standard (ABC). Eclair is not affected as it doesn't support currency conversion so it will reject any offer that sets offer_currency.

thomash-acinq avatar Jun 02 '25 13:06 thomash-acinq

OK, I understand the problem now. But if we really want to check that this field is a valid ISO 4217 code we should also add test vectors with symbols (US$), lowercase letters (usd) or even codes that look good but are not part of the standard (ABC). Eclair is not affected as it doesn't support currency conversion so it will reject any offer that sets offer_currency.

I have change the offer to lno1qcplllhapqpq86q2q4qkc6trv5tzzq6muh550qsfva9fdes0ruph7ctk2s8aqq06r4jxj3msc448wzwy9s

that have this data part: 0603fffefd080203e80a05416c6963651621035be5e9478209674a96e60f1f037f6176540fd001fa1d64694770c56a7709c42c

06 03 fffefd  // Type: 6 (offer_currency), Length: 3, Value: three bytes that form an invalid utf8 string
08 02 03e8 // Type: 8 (offer_amount), Length: 2, Value: 1000
0a 05 416c696365 // Type: 10 (offer_description), Length: 5, Value: "Alice"
16 21 035be5e9478209674a96e60f1f037f6176540fd001fa1d64694770c56a7709c42c  // Type: 22 (offer_issuer_id), Length: 33, Value : some public key

erickcestari avatar Jun 04 '25 12:06 erickcestari

@thomash-acinq @rustyrussell can you test this updated test vector?

t-bast avatar Jun 11 '25 09:06 t-bast

@thomash-acinq @rustyrussell ping?

t-bast avatar Jul 25 '25 07:07 t-bast

Eclair now supports the currency field and properly validates it. The new test case is correct (as was the old one), however I think it should be replaced by something that looks a lot more like a real currency code, like "usd" or "ABC". There is only a small number of acceptable values and checking that it's valid UTF8 feels like checking that a number is nonnegative when the only valid values are 0 and 1.

thomash-acinq avatar Jul 28 '25 08:07 thomash-acinq

Eclair now supports the currency field and properly validates it. The new test case is correct (as was the old one), however I think it should be replaced by something that looks a lot more like a real currency code, like "usd" or "ABC". There is only a small number of acceptable values and checking that it's valid UTF8 feels like checking that a number is nonnegative when the only valid values are 0 and 1.

I got a bit confused about what the spec wants to say about the ISO 4217 requirement. Does "as an ISO 4217 three-letter code" mean it should validate against the actual ISO 4217 currency list, or just follow the 3-letter ASCII uppercase format?

If it's full validation against real currencies, then testing with "ABC" makes sense since it would be rejected. But if it's just format validation, then "ABC" would actually be valid and the UTF-8 test is more appropriate for catching actual format violations.

The spec seems ambiguous on this point. Clarifying this would help ensure consistent behavior across implementations.

erickcestari avatar Jul 28 '25 11:07 erickcestari

For me "an ISO 4217 three-letter code" clearly means that it must be in the actual list, so "ABC" must fail, I don't find it ambiguous. That's how eclair does it.

thomash-acinq avatar Jul 28 '25 11:07 thomash-acinq

That list changes regularly (presumably), though. While I think it's reasonable to validate against, I don't think we'll bother trying to keep a list up to date. We just reject anything that isn't 3-char ASCII uppercase letters.

TheBlueMatt avatar Jul 28 '25 19:07 TheBlueMatt

We simply check that it's valid UTF-8 (it's part of our parsing code):

$ ./devtools/bolt12-cli decode lno1qcplllhapqpq86q2q4qkc6trv5tzzq6muh550qsfva9fdes0ruph7ctk2s8aqq06r4jxj3msc448wzwy9s
bolt12-cli: Bad offer: invalid offer data

If it's an unknown currency we will decode it and allow the user to fetch it (there are no requirements on the contents of this field unless "if it uses offer_amount to provide the user with a cost estimate:", which is the front end's responsibility):

lightning-cli decode lno1qcp5zsjrpqpzwyq2p32x2um5ypmx2cm5dae8x93pqthvwfzadd7jejes8q9lhc4rvjxd022zv5l44g6qah82ru5rdpnpj
{
   "type": "bolt12 offer",
   "offer_id": "f6b441732d02887b91b1f029ab308101e03f9e0c10ae3357f01045fb4970895d",
   "offer_currency": "ABC",
   "offer_amount": 10000,
   "warning_unknown_offer_currency": "unknown currency code",
   "offer_description": "Test vectors",
   "offer_issuer_id": "02eec7245d6b7d2ccb30380bfbe2a3648cd7a942653f5aa340edcea1f283686619",
   "valid": true
}

Note: we don't allow creation of such things (I had to use our external tool to make this): we can't parse "100ABC" because we don't know how many decimals. e.g. 100USD becomes 10,000 "offer_amount";

$ lightning-cli offer 1000ABC test-offer-bad-currency
{
   "code": -32602,
   "message": "Unknown currency suffix ABC"
}

Ack 5d3a0d57f6a3decebfc2abf74a99b1979d2b0630

rustyrussell avatar Jul 28 '25 21:07 rustyrussell

Yes it's inconvenient to have to keep the list up to date, however it's necessary to be able to understand the offer. The list is important not just to check that the currency is valid (we could just show the currency code to the end user who we can assume is already familiar with such codes) but to display the amount. If I create an offer that costs 50$ and you don't validate against the list, you will display 5000 USD instead of the correct 50.00 USD.

thomash-acinq avatar Jul 29 '25 08:07 thomash-acinq

Right, the actual user-facing logic really needs to know what the currency is (also to do conversion), we just punt that upstream rather than handling it ourselves (and it sounds like CLN does too).

TheBlueMatt avatar Jul 30 '25 12:07 TheBlueMatt