Suspected improper padding bit logic
Line 346 of encoder.py intends to extend the data stream buffer with padding bits if it does not meet a codeword boundary.
However, if length % 8 is 0, meaning that the length does indeed fall on the codeword boundary, then this implementation adds 8 padding bits, and it should not do so.
You can see the difference in final forms of the encoding of the string known via segno: segno.make("known", version=1, error='L', mode='byte', mask=0, boost_error=False) vs. the equivalent results from and . All three of these will be decoded to the same word, though.
I concur with this assessment. The line
buff.extend([0] * (8 - (length % 8)))
could be replaced by
buff.extend([0] * (-length % 8))
to get the correct behavior. Note that this works because the modulo operator in Python is defined to return a value with the same sign as its right-hand side, so there is no issue with applying x%8 to a negative number x.
Encoding a micro code with text "abc" only has empty padding and does not use the repeating pattern.
qr = segno.make('abc', micro=True)
This correctly follows this logic:
abcrequires byte characters, not numeric or alphanumeric- M3 is smallest version that allows byte characters
- M is chosen as the ECC mode as there is sufficient length to encode the data
However, the padding after the encoded bits are all zeros; the repeating pattern does not appear. According to spec, it should add a null terminator of 7 bits, then pad with zeros to the next codeword, then pad with the repeating pattern.
For reference, the terminator should be all zeros of the following lengths:
- 4 bits for non-micro codes
- 3 bits for M1
- 5 bits for M2
- 7 bits for M3
- 9 bits for M4
Also, for M1 and M3 codes, the final half-codeword should be all zeros and not contain the repeating pattern.