Fix currency UTF-8 test vector to actually test invalid UTF-8 sequences
The previous test vector for "Malformed: invalid currency UTF-8" was incorrectly testing invalid length encoding (02 length with 2804 value) rather than testing invalid UTF-8 byte sequences in the currency field itself.
This commit replaces it with a test vector that:
- Uses correct length encoding (03 length with proper 3-byte value)
- Contains an actual invalid UTF-8 sequence in the currency data
- Properly validates that parsers reject malformed UTF-8 in currency fields
@thomash-acinq can you review this and check what eclair does?
The current test vector uses the following TLV stream:
06 02 8041 // Type: 6 (offer_currency), Length: 2, Value: two bytes that form an invalid utf8 string
0a 05 414c494345 // Type: 10 (offer_description), Length: 5, Value: "ALICE"
16 21 020202020202020202020202020202020202020202020202020202020202020202 // Type: 22 (offer_issuer_id), Length: 33, Value : some public key
It does what it claims to do (test that the offer_currency is validated as a utf8 string) and I don't see any need to change it.
Your proposed change is
06 03 4d2aa4 // Type: 6 (offer_currency), Length: 3, Value: three bytes that form an invalid utf8 string
08 02 2710 // Type: 8 (offer_amount)
0a 0c 5465737420766563746f7273 // Type: 10 (offer_description)
16 21 02eec7245d6b7d2ccb30380bfbe2a3648cd7a942653f5aa340edcea1f283686619 // Type: 22 (offer_issuer_id)
which does not seem to be an improvement.
The current test vector uses the following TLV stream:
06 02 8041 // Type: 6 (offer_currency), Length: 2, Value: two bytes that form an invalid utf8 string 0a 05 414c494345 // Type: 10 (offer_description), Length: 5, Value: "ALICE" 16 21 020202020202020202020202020202020202020202020202020202020202020202 // Type: 22 (offer_issuer_id), Length: 33, Value : some public keyIt does what it claims to do (test that the
offer_currencyis validated as a utf8 string) and I don't see any need to change it.Your proposed change is
06 03 4d2aa4 // Type: 6 (offer_currency), Length: 3, Value: three bytes that form an invalid utf8 string 08 02 2710 // Type: 8 (offer_amount) 0a 0c 5465737420766563746f7273 // Type: 10 (offer_description) 16 21 02eec7245d6b7d2ccb30380bfbe2a3648cd7a942653f5aa340edcea1f283686619 // Type: 22 (offer_issuer_id)which does not seem to be an improvement.
Some applications (rust-lightning) validate currency codes by first checking for exactly 3 bytes before parsing the UTF-8 content. Since this test vector isn't 3 bytes long, those applications would reject it based on length rather than detecting the invalid UTF-8 encoding. Using a 3-byte invalid UTF-8 sequence would better test the UTF-8 validation logic specifically. I'll also remove the other fields to keep this test vector focused solely on offer_currency.
OK, I understand the problem now. But if we really want to check that this field is a valid ISO 4217 code we should also add test vectors with symbols (US$), lowercase letters (usd) or even codes that look good but are not part of the standard (ABC).
Eclair is not affected as it doesn't support currency conversion so it will reject any offer that sets offer_currency.
OK, I understand the problem now. But if we really want to check that this field is a valid ISO 4217 code we should also add test vectors with symbols (US$), lowercase letters (usd) or even codes that look good but are not part of the standard (ABC). Eclair is not affected as it doesn't support currency conversion so it will reject any offer that sets
offer_currency.
I have change the offer to lno1qcplllhapqpq86q2q4qkc6trv5tzzq6muh550qsfva9fdes0ruph7ctk2s8aqq06r4jxj3msc448wzwy9s
that have this data part:
0603fffefd080203e80a05416c6963651621035be5e9478209674a96e60f1f037f6176540fd001fa1d64694770c56a7709c42c
06 03 fffefd // Type: 6 (offer_currency), Length: 3, Value: three bytes that form an invalid utf8 string
08 02 03e8 // Type: 8 (offer_amount), Length: 2, Value: 1000
0a 05 416c696365 // Type: 10 (offer_description), Length: 5, Value: "Alice"
16 21 035be5e9478209674a96e60f1f037f6176540fd001fa1d64694770c56a7709c42c // Type: 22 (offer_issuer_id), Length: 33, Value : some public key
@thomash-acinq @rustyrussell can you test this updated test vector?
@thomash-acinq @rustyrussell ping?
Eclair now supports the currency field and properly validates it. The new test case is correct (as was the old one), however I think it should be replaced by something that looks a lot more like a real currency code, like "usd" or "ABC". There is only a small number of acceptable values and checking that it's valid UTF8 feels like checking that a number is nonnegative when the only valid values are 0 and 1.
Eclair now supports the currency field and properly validates it. The new test case is correct (as was the old one), however I think it should be replaced by something that looks a lot more like a real currency code, like "usd" or "ABC". There is only a small number of acceptable values and checking that it's valid UTF8 feels like checking that a number is nonnegative when the only valid values are 0 and 1.
I got a bit confused about what the spec wants to say about the ISO 4217 requirement. Does "as an ISO 4217 three-letter code" mean it should validate against the actual ISO 4217 currency list, or just follow the 3-letter ASCII uppercase format?
If it's full validation against real currencies, then testing with "ABC" makes sense since it would be rejected. But if it's just format validation, then "ABC" would actually be valid and the UTF-8 test is more appropriate for catching actual format violations.
The spec seems ambiguous on this point. Clarifying this would help ensure consistent behavior across implementations.
For me "an ISO 4217 three-letter code" clearly means that it must be in the actual list, so "ABC" must fail, I don't find it ambiguous. That's how eclair does it.
That list changes regularly (presumably), though. While I think it's reasonable to validate against, I don't think we'll bother trying to keep a list up to date. We just reject anything that isn't 3-char ASCII uppercase letters.
We simply check that it's valid UTF-8 (it's part of our parsing code):
$ ./devtools/bolt12-cli decode lno1qcplllhapqpq86q2q4qkc6trv5tzzq6muh550qsfva9fdes0ruph7ctk2s8aqq06r4jxj3msc448wzwy9s
bolt12-cli: Bad offer: invalid offer data
If it's an unknown currency we will decode it and allow the user to fetch it (there are no requirements on the contents of this field unless "if it uses offer_amount to provide the user with a cost estimate:", which is the front end's responsibility):
lightning-cli decode lno1qcp5zsjrpqpzwyq2p32x2um5ypmx2cm5dae8x93pqthvwfzadd7jejes8q9lhc4rvjxd022zv5l44g6qah82ru5rdpnpj
{
"type": "bolt12 offer",
"offer_id": "f6b441732d02887b91b1f029ab308101e03f9e0c10ae3357f01045fb4970895d",
"offer_currency": "ABC",
"offer_amount": 10000,
"warning_unknown_offer_currency": "unknown currency code",
"offer_description": "Test vectors",
"offer_issuer_id": "02eec7245d6b7d2ccb30380bfbe2a3648cd7a942653f5aa340edcea1f283686619",
"valid": true
}
Note: we don't allow creation of such things (I had to use our external tool to make this): we can't parse "100ABC" because we don't know how many decimals. e.g. 100USD becomes 10,000 "offer_amount";
$ lightning-cli offer 1000ABC test-offer-bad-currency
{
"code": -32602,
"message": "Unknown currency suffix ABC"
}
Ack 5d3a0d57f6a3decebfc2abf74a99b1979d2b0630
Yes it's inconvenient to have to keep the list up to date, however it's necessary to be able to understand the offer. The list is important not just to check that the currency is valid (we could just show the currency code to the end user who we can assume is already familiar with such codes) but to display the amount. If I create an offer that costs 50$ and you don't validate against the list, you will display 5000 USD instead of the correct 50.00 USD.
Right, the actual user-facing logic really needs to know what the currency is (also to do conversion), we just punt that upstream rather than handling it ourselves (and it sounds like CLN does too).