labrador-ldpc Seeking usage advice

Hi Adam,

I hope you're doing well. While researching error detection and automatic correction coding techniques on crates.io, I came across your labrador_ldpc project. Thanks for your valuable contribution! I'm now aiming to develop a Python plugin designed to protect important some byte streams using error correction techniques. To increase efficiency, the core logic will be implemented in Rust. Your project was very inspiring; however, although I am familiar with Python, I am less familiar with Rust programming and do not have extensive experience with error detection and correction coding (my only experience is with the (7.4) Hamming codes). As a result, despite reading your project documentation, I am still somewhat confused.

I would like to ask you some questions about labrador_ldpc. If you have the time, could you please help me? There are several issues I would like your insight on as the author.

As described above, my requirements are relatively simple: I want to encode arbitrary byte streams into larger byte streams to allow automatic error correction during transmission or storage. I have a significant amount of important data that needs protection, and I have observed several bit flips in physical storage media. While these errors are not severe, I believe they can be corrected using error correction techniques. In addition, due to the large amount of data, I want to maximize computational efficiency.

Based on these requirements, I hope you can help me clarify how to use your library's logic and integrate it into my Python plugin. Below I have outlined several steps and assumptions; please help me verify their accuracy.

Read the byte stream from a Python data structure and represent it in Rust as &[u8] or Vec<u8>. The latter requires a memory copy.
Since bit flips are rare in consumer-grade storage media, I believe it is appropriate to use minimal redundancy encoding to provide basic functionality, since multiple unreliable errors are unlikely to occur simultaneously in storage media. In addition, since the byte stream content is typically large, the block size should not be too small. For example, TM1280 seems quite suitable. If I understand correctly, each block accepts 128 bytes and adds 32 bytes of redundancy. I would like to know how many bytes of error they can correct.
After reading the original input, since the length of the input byte stream may not be a multiple of the block size, I have to add some padding bytes at the end. This seems to make memory copying unavoidable. It seems I need to create a new txcode of length 160 to store the encoding results.
During this process, it seems possible to use Rust's unsafe features to convert a u8 pointer to a u64 pointer to improve conversion performance without incurring the memory copy overhead. I am not very familiar with these features and would like your advice.
Finally, use your LDPCCode::TM1280 to encode the txcode and then write it to the output byte stream.
If the input byte stream length is long, I have to split it into blocks myself, then repeat the above process and finally combine the output results.
I would like to know if using decode_bf for decoding is appropriate to improve efficiency in my scenario. Regarding the LDPC algorithm, can the decoder automatically detect the end of the file (so that the byte stream can be restored to the original input length, rather than a multiple of the block size)? Also, could you explain the meaning of performing 20 decoding iterations and whether this is a value you recommend?

These are my understanding and questions about your project. I hope you can help me clarify them. If you have the time, I would like to discuss further with you to better understand your project. Thank you for your patience in reading this and I look forward to your response!

Jan 07 '25 13:01 GoodManWEN

It's hard to be sure without some more details about your application, but typically I would expect that other error coding techniques would serve you better than the relatively short block LDPC codes in this project. These are really optimised for sending small telecommand or telemetry packets (at most 1kB but often smaller) over very unreliable radio links. In this scenario it's worth spending a decent amount of computational and transmission resource on error coding, so these codes are generally low rate (in other words, high overhead). Additionally, the basic error model these codes are designed to work best with is that you receive some corrupted real-valued signal for each bit, in other words something like [0.02, 0.04, 0.98, 0.62] for [0, 0, 1, 1], but while the final digit could have been a corrupted 0 it's quite unlikely that the first digit was a corrupted 1. In comparison it sounds like your data is likely to suffer bit flips and you only get binary-valued signals, which is a different (and harder) class of error correction. It's not that these codes can't work like that, but it's not as effective as when you have these real-valued observations.

For error correction on large files or large amounts of data, you'd probably be better placed with some variety of Reed-Solomon coding. To be honest though it sounds like you might be best using a file system that provides integrity checking and error correction for you, without having to implement your own error correction on top. If you're seeing bit flips in files you store on commodity hard drives, the hard drive has failed badly - any error like this should be detected and flagged and indicates the hard drive has become defective. Backups or a file system that handles error correction are a more traditional solution.

I think the process you describe would work, just I doubt it would be a good way to solve your problem.

Jan 11 '25 02:01 adamgreig

Thank you so much for your detailed response! It's been very helpful. Since I'm unfamiliarity with the framework's internal implementation logic and algorithms, your confirmation will save me a huge amount of debugging time.

Regarding your suggestion to use a file system with built-in error detection and correction capabilities, that's undoubtedly an better solution for simple file storage, and it would address most of my current use case. But I'm still inclined to pursue my original idea of developing a byte-stream protection tool, as it could be adapted for a wider range of applications.

After receiving your reply, I have done further research. As summarized by LLM, my primary requirement is to handle sporadic errors, such as occasional bit flips in storage or memory, typically involving single-bit or double-byte errors. For this particular scenario, LDPC codes seem less appropriate, RS codes may be a better choice, given their effectiveness in handling burst errors.

Thanks again for your kind reply and guidance.

Feb 21 '25 01:02 GoodManWEN

labrador-ldpc labrador-ldpc copied to clipboard

Seeking usage advice

labrador-ldpc
labrador-ldpc copied to clipboard