rust-lightning Own implementation of Bech32::u5 type, upgrade of bech32 dependency

Own implementation of u5 type and conversions. Upgrade of bech32 dependency, from 0.9.1 to 0.11.0.

Fixes #3176 and #3195 . Obsoletes #3181 .

Changes summary:

Introduced lightning::util::bech32::u5 type, for holding a 32-bit value, taken over from bech32 v0.9.1 (This type is no longer available in bech32` crate)
It has simple type conversions (from/to u8), Bech32 character encoding/decoding (from/to char), ...
... unpacking from packed bytes and packing to bytes (iterative), traits for conversions.
External crate bech32 is still used directly in a few places; it has been upgraded to 0.11.0.

More details in #3195 .

Potential further improvements:

Encapsulate checksum functionality into util/bech32.rs, by calling into external bech32 implementation
Add HRP decoding/encoding functionality to bech32/mod.rs, by calling into external crate, or better, by own implementation (simple string parsing / concatenation)
Move tagged parsing logic to bech32/mod.rs
Add test vectors from lightning invoice spec (BOLT11)
Isolate bech32 functionality into a standalone crate (e.g. bech32-ligthning).
Remove dependency to external crate completely

TODO:

[x] fix fuzz target
[x] fmt
[ ] squash

Jul 23 '24 13:07 optout21

Codecov Report

Attention: Patch coverage is 94.52888% with 18 lines in your changes missing coverage. Please review.

Project coverage is 91.36%. Comparing base (78c0eaa) to head (c8279fe). Report is 206 commits behind head on main.

Files	Patch %	Lines
lightning/src/util/bech32.rs	97.09%	8 Missing :warning:
lightning-invoice/src/de.rs	84.00%	2 Missing and 2 partials :warning:
lightning/src/offers/parse.rs	63.63%	2 Missing and 2 partials :warning:
lightning-invoice/src/ser.rs	83.33%	0 Missing and 1 partial :warning:
lightning/src/util/test_utils.rs	0.00%	1 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3201      +/-   ##
==========================================
+ Coverage   89.80%   91.36%   +1.56%     
==========================================
  Files         121      123       +2     
  Lines      100045   113913   +13868     
  Branches   100045   113913   +13868     
==========================================
+ Hits        89845   104078   +14233     
+ Misses       7533     7289     -244     
+ Partials     2667     2546     -121

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jul 23 '24 13:07 codecov[bot]

Also, do you think this could possibly be split up into two commits? First the introduction of the new type, and then switching to it? Or would it cause too much havoc with check_commits and needing to add and then remove allow_unused?

Jul 24 '24 12:07 arik-so

Also, do you think this could possibly be split up into two commits? First the introduction of the new type, and then switching to it? Or would it cause too much havoc with check_commits and needing to add and then remove allow_unused?

It is possible to split up into two commits, it helps review, but check_commits will probably complain, and it should be squashed before merging. Or did you have in mind 2 separate PRs? Also, what would be the benefit? Since the addition is in a new file, it is quite easy to separate during review.

Jul 25 '24 15:07 optout21

It is possible to split up into two commits, ...

Actually, it's not really possible to separate the external upgrade and the new own implementation:

if external crate is upgraded, code has to be adopted throughout, which does require the new own implementation
if the old version of the external crate is kept, the new implementation has to be adapted to the new version, which does not make much sense.

Jul 25 '24 15:07 optout21

I'm a little confused about the motivation for this PR. Upstream you complained that Fe32 did not have Ord or Default, but if I remove those impls from your u5 nothing fails to compile in lightning except your test for Default being zero. It appears that you use these impls in lightning-invoice within ser.rs in a couple places:

One is to implement extend_from_stretch generically over T: Default. But actually you only call this function with u8 and u5, and when you call it with u5 the semantics are slightly different (in that you have this Optional return but you always call .expect on it). A short-diff fix for this is

diff --git a/lightning-invoice/src/ser.rs b/lightning-invoice/src/ser.rs
index 7317b5cc3..4e9bce317 100644
--- a/lightning-invoice/src/ser.rs
+++ b/lightning-invoice/src/ser.rs
@@ -177,6 +177,22 @@ impl Display for SiPrefix {
 	}
 }
 
+fn encode_int_be_base32_fixed(int: u64, size: usize) -> Vec<u5> {
+	let base = 32u64;
+
+	let mut out_vec = Vec::<u5>::new();
+
+	let mut rem_int = int;
+	for _ in 0..size {
+		out_vec.push(u5::try_from_u8((rem_int % base) as u8).expect("always <32"));
+		rem_int /= base;
+	}
+	assert_eq!(rem_int, 0, "input {} did not fit into {} u5 words (remainder {})", int, size, rem_int);
+
+	out_vec.reverse();
+	out_vec
+}
+
 fn encode_int_be_base32(int: u64) -> Vec<u5> {
 	let base = 32u64;
 
@@ -250,10 +266,7 @@ impl ToBase32 for RawDataPart {
 impl ToBase32 for PositiveTimestamp {
 	fn write_base32<W: WriteBase32>(&self, writer: &mut W) -> Result<(), <W as WriteBase32>::Err> {
 		// FIXME: use writer for int encoding
-		writer.write(
-			&try_stretch(encode_int_be_base32(self.as_unix_timestamp()), 7)
-				.expect("Can't be longer due than 7 u5s due to timestamp bounds")
-		)
+		writer.write(&encode_int_be_base32_fixed(self.as_unix_timestamp() , 7))
 	}
 }
 
@@ -415,10 +428,12 @@ impl ToBase32 for TaggedField {
 			assert!(len < 1024, "Every tagged field data can be at most 1023 bytes long.");
 
 			writer.write_u5(u5::try_from_u8(tag).expect("invalid tag, not in 0..32"))?;
-			writer.write(&try_stretch(
-				encode_int_be_base32(len as u64),
-				2
-			).expect("Can't be longer than 2, see assert above."))?;
+			writer.write(&encode_int_be_base32_fixed(len as u64, 2))?;
+
+			writer.write(&[
+                            u5::try_from_u8((len / 32) as u8).unwrap(),
+                            u5::try_from_u8((len % 32) as u8).unwrap(),
+                        ])?;
 			payload.write_base32(writer)
 		}

But of course a proper fix would stop using Vec unnecessarily and get rid of these unwraps. For example you could use the upstream ByteIterExt trait then call uint64.to_be_bytes().iter().copied().bytes_to_fes() and then copy the output of that into a fixed-size array. (Or just collect if you want to keep the Vec and address it later, which might be easier to review.)

Then for Ord, it seems this is needed to derive Ord for RawTaggedField. But this is a two-variant enum, and while annoying, it is very easy to manually impl Ord for this:

diff --git a/lightning-invoice/src/lib.rs b/lightning-invoice/src/lib.rs
index 063ae8f40..e6fc5b1bc 100644
--- a/lightning-invoice/src/lib.rs
+++ b/lightning-invoice/src/lib.rs
@@ -419,7 +419,7 @@ impl From<Currency> for Network {
 /// Tagged field which may have an unknown tag
 ///
 /// This is not exported to bindings users as we don't currently support TaggedField
-#[derive(Clone, Debug, Hash, Eq, PartialEq, Ord, PartialOrd)]
+#[derive(Clone, Debug, Hash, Eq, PartialEq)]
 pub enum RawTaggedField {
 	/// Parsed tagged field with known tag
 	KnownSemantics(TaggedField),
@@ -427,6 +427,23 @@ pub enum RawTaggedField {
 	UnknownSemantics(Vec<u5>),
 }
 
+impl PartialOrd for RawTaggedField {
+    fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl Ord for RawTaggedField {
+    fn cmp(&self, other: &Self) -> core::cmp::Ordering {
+        match (self, other) {
+            (RawTaggedField::KnownSemantics(..), RawTaggedField::UnknownSemantics(..)) => core::cmp::Ordering::Less,
+            (RawTaggedField::UnknownSemantics(..), RawTaggedField::KnownSemantics(..)) => core::cmp::Ordering::Greater,
+            (RawTaggedField::KnownSemantics(ref a), RawTaggedField::KnownSemantics(ref b)) => a.cmp(b),
+            (RawTaggedField::UnknownSemantics(ref a), RawTaggedField::UnknownSemantics(ref b)) => a.iter().map(|a| a.as_u8()).cmp(b.iter().map(|b| b.as_u8())),
+        }
+    }
+}
+
 /// Tagged field with known tag
 ///
 /// For descriptions of the enum values please refer to the enclosed type's docs.

And both of these changes could be done in commits independent of and prior to any bech32 changes, meaning that their review can be isolated from the rest of your changes.

Also, even if you did want to create your own u5 type, why reimplement things like conversion of u5strings to bytestrings when those exist upstream and can be easily implemented in terms of Fe32? Similar for the char conversions.

Aug 02 '24 11:08 apoelstra

Thanks for looking into this in detail!

You are right that the Default requirement could worked around, but I think the Default trait makes sense for a numerical value, and it doesn't degrade code quality.

For Ord: correct, it is needed for the tagged values in lightning invoices, for the unparsed parts (UnknownSemantics). Ord and PartialOrd could be implmented as you suggest, but even in your implementation it converts to u8 and does the comparison. I think it's just simpler if u5 natively support it. (My -- now discarded -- solution was to wrap Fe32 in a struct with Ord implemented, and use that inside UnknownSemantics.)

If my goal was to keep using Fe32 these suggestions would make it possible.

Also, even if you did want to create your own u5 type, why reimplement things like conversion of u5strings to bytestrings when those exist upstream and can be easily implemented in terms of Fe32? Similar for the char conversions.

You are right, and I had a version like that. However, after weighing the conversions and adaptations needed vs. the code/logic to be duplicated, I arrived at the conclusion that it's easier to essentially duplicate some logic in the way suitable for the LDK project. Lightning invoice parsing is a basic operation for LDK, and I think it makes sense to have a well separated, minimal dependency module within LDK for lightning parsing, I'd like to move into that direction. In general I try to achieve high code reuse, but in this case maybe it's more practical to go towards more self-sufficient solution with no external dependency.

I have written up my concerns in the issue https://github.com/lightningdevkit/rust-lightning/issues/3195 .

But I'm listening to your opinion as well. I may revisit the version without code duplication and check the shortcoming again ( #3181 )

Aug 09 '24 10:08 optout21

For Ord: correct, it is needed for the tagged values in lightning invoices, for the unparsed parts (UnknownSemantics). Ord and PartialOrd could be implmented as you suggest, but even in your implementation it converts to u8 and does the comparison. I think it's just simpler if u5 natively support it.

My suggestion does not convert to u8 anywhere.

And no, it is not "simpler" to implement a nonsensical ordering on a field element when there is a standard notion of "ordered field" which does not apply to the field in question.

this case maybe it's more practical to go towards more self-sufficient solution with no external dependency.

You intend to write your own bech32 checksum encoding and verification code? I will switch to the other issue to avoid splitting the conversation but this is a dramatic failure of the bech32 API if it is unable to do basic checksumming.

Aug 09 '24 13:08 apoelstra

Put back to daft for now, depends on #3234, and it is still open whether to go the route of #3181 .

Aug 12 '24 08:08 optout21

Pivoted back to reusing rust-bech32 as much as possible, now in #3244 (following #3234 ).

Aug 15 '24 11:08 optout21