leopard Question regarding the API

Question regarding the API

Open liamsi opened this issue 4 years ago • 4 comments

I'm writing a go-wrapper for this implementation; BTW the C wrapper makes that really convenient, thanks for that!

I struggle to completely understand the API though. What I'm trying to do is quite basic: given n original data buffers or symbols, I want to generate n "parity symbols" s.t. that I can loose any n (of the now 2*n total) symbols and still recover the data.

I mainly experimented with the API in banchmarks.cpp: OK, sounds like I would need to call leo_encode with original_count == recovery_count for that, right? leo_encode_work_count returns 2*original_count for these params and calling leo_encode yields encode_work_data with the size 2*original_count as expected. So far so good. What is surprising though: I would have expected encode_work_data to contain the original data too instead it does seem to contain 2*original_count recovery or parity symbols. So is there no way I can achieve what described above using the public C API or am I using recovery_count wrong?

Thanks in advance :-)

May 12 '20 15:05 liamsi

Looking through the paper that describes the encode and ReedSolomonEncode it seems like the code reads mostly like a straightforward implementation of this:

Source: https://www.citi.sinica.edu.tw/papers/whc/3450-F.pdf

So with that notation, n=2k and I want to hand over k message symbols and get k parity symbols.

What would be the correct parameters for that? As described above leo_encode returns 2k symbols instead (which do not clearly contain the original symbols, at least not in a non-encoded form).

May 12 '20 21:05 liamsi

I think we (or mainly @musalbas) figured it out: if the number k is <= 256, leopard behaves exactly as described above. If k> 256 then the recovery bytes will be 2k instead which seems to be a property of ff16 ...

May 14 '20 17:05 liamsi

Well, actually this is not the case (this is independent from ff8 or ff16):

leo_encode_work_count always multiplies the next power of 2 of recovery count by 2: https://github.com/catid/leopard/blob/b58d1eaf59e263d25e473a3460437e19386956eb/leopard.cpp#L102

So e.g. in the case of orig_count == recovery_count == some power of two, this means that half the arrays in encode_work won't be used (the 2nd half will always be just all buffer_bytes sized arrays that are all zeroes). It seems superfluous to always double the data here. Is there any particular reason for this? The encode method also requires this behaviour while there are edge cases (e.g. work_count < 2*recovery_count && work_count == m), where work_count doesn't need to be 2*m): https://github.com/catid/leopard/blob/b58d1eaf59e263d25e473a3460437e19386956eb/leopard.cpp#L162-L166

The workaround for my use-case was dropping the superfluous chunks (the 2nd half) on encode and decode.

May 20 '20 10:05 liamsi

Sorry just noticed your work here. I hope you were able to get through the issue

Jan 02 '21 08:01 catid

leopard leopard copied to clipboard

Question regarding the API

leopard
leopard copied to clipboard