leopard
leopard copied to clipboard
Request: Allow recovery blocks > source blocks
The linked papers go a bit over my head. Is there an intrinsic reason we can't use recovery blocks > source blocks so long as source + recovery < 65535?
Perhaps a workaround would be to pad up to recovery blocks before encoding, drop them in the channel, and re-add before decoding. I don't see why this wouldn't work .. but it would still limit us to 32k recovery blocks.
I cannot parse your statement.. You always reconstruct from a fixed number of blocks.
related work in progress: https://github.com/w3f/rs-ec-perf/
There is code in leopard.cpp that disallows encoding where recovery blocks is more than source blocks:
if (recovery_count <= 0 || recovery_count > original_count)
return Leopard_InvalidCounts;
I'm wondering why this restriction is there. This doesn't have anything to do with whether the number of blocks is fixed or not.
The math is different for that case, and it is not implemented in this repo.
I believe it's the "encodeL" path that is commented out in this code: https://github.com/catid/Fast-algorithms-of-Reed-Solomon-erasure-codes
If you understand Rust code, I've implemented a Rust-library based on Leopard-RS which implements both high rate and low rate encoding/decoding: https://crates.io/crates/reed-solomon-16
I've also started short documentation of algorithm here which describes high/low rate encoding (but not yet decoding): https://docs.rs/reed-solomon-16/0.1.0/reed_solomon_16/algorithm/index.html#encoding
We also implemented a variant of leopard in https://github.com/w3f/rs-ec-perf/ and https://github.com/paritytech/reed-solomon-novelpoly with some help from Ming-Shing Chen, who knows the field better than us. We're not completely done optimizing the extension field arithmetic yet. cc @drskalman @drahnr
I'm afraid our code contains include!
everywhere since our optimization choices impact how we abstract the field, meaning how the field traits expose the prepared multiplication trick, so writing nice field traits now harms progress. See https://github.com/drahnr/rs-ec-perf/issues/7
In principle, we'd even consider GF(2^12) if we found the extension field arithmetic considerably faster. I never pushed for doing that since Ming-Shing Chen felt it be overkill.
As @jerry-skydio wrote ago, there are lines to limit number of recovery blocks.
In "leopard.cpp" line 134 for leo_encode():
if (recovery_count <= 0 || recovery_count > original_count)
return Leopard_InvalidCounts;
line 245 for leo_decode():
if (recovery_count <= 0 || recovery_count > original_count)
return Leopard_InvalidCounts;
While I test the behavior and read source code, I found an interesting point. It's possible to remove the limit for "recovery_count <= original_count". At 8-bit Codes, "recovery_count <= 128". At 16-bit Codes, "recovery_count <= 32768". You may modify the lines to be like below;
if (recovery_count <= 0)
return Leopard_InvalidCounts;
While leopard-RS library implements encodeH (High rate encoder), it simply zero fill original data internally. For example, when "original_count = 10" and "recovery_count = 32", it fills 22 zero data after original 10 data. By the padding, it calculates 32 recovery data from 32 original data. (10 actual data and 22 dummy zero.) You can set the values directly like below;
leo_encode_work_count(10, 32);
At recovery you should input the recovery_count of encoding time. Even when you recover 10 original data, you needs to set 32 for recovery_count.
leo_decode_work_count(10, 32);
In this case, while you set 32 for recovery_count, you need 10 recovery data to recover 10 original data. You may set any 10 recovery data from 32 created recovery data. Or you may put all available data. It works. if there are enough recovery data.
Caution, the size is power of 2 in calculating recovery data. leo_encode_work_count(10, 10) and leo_encode_work_count(10, 16) create same recovery data. As power of 2, 10 becomes 16 by round up. On the otehr hand, leo_encode_work_count(10, 16) and leo_encode_work_count(10, 32) create different recovery data. This is why you should set same numbers at leo_encode_work_count() and leo_decode_work_count(). It's ok to set different value, if their power of 2 size are same. leo_decode_work_count(10, 17) and leo_decode_work_count(10, 32) will work samely. When recovery_count is less then power of 2, the recovery data is treated as lost.
While that is true, note that encodeH and encodeL are generally not compatible with each other.
What I mean is, that if you have "original_count = 10" and "recovery_count = 32" and encode that with encodeH, then you must use decodeH for decoding and can't use decodeL since that would give wrong result.
So if a program implements cases like this with encodeH, then changing to use encodeL instead in the future would be a breaking change.
(ps. There are also speed and memory usage differences between encodeH and encodeL. I have some notes on speed here, but didn't document memory usage yet.)