segno icon indicating copy to clipboard operation
segno copied to clipboard

Suspected improper padding bit logic

Open trjr opened this issue 9 months ago • 2 comments

Line 346 of encoder.py intends to extend the data stream buffer with padding bits if it does not meet a codeword boundary.

However, if length % 8 is 0, meaning that the length does indeed fall on the codeword boundary, then this implementation adds 8 padding bits, and it should not do so.

You can see the difference in final forms of the encoding of the string known via segno: segno.make("known", version=1, error='L', mode='byte', mask=0, boost_error=False) vs. the equivalent results from and . All three of these will be decoded to the same word, though.

trjr avatar Mar 27 '25 15:03 trjr

I concur with this assessment. The line

buff.extend([0] * (8 - (length % 8)))

could be replaced by

buff.extend([0] * (-length % 8))

to get the correct behavior. Note that this works because the modulo operator in Python is defined to return a value with the same sign as its right-hand side, so there is no issue with applying x%8 to a negative number x.

sidneycadot avatar Apr 06 '25 08:04 sidneycadot

Encoding a micro code with text "abc" only has empty padding and does not use the repeating pattern.

qr = segno.make('abc', micro=True)

This correctly follows this logic:

  • abc requires byte characters, not numeric or alphanumeric
  • M3 is smallest version that allows byte characters
  • M is chosen as the ECC mode as there is sufficient length to encode the data

However, the padding after the encoded bits are all zeros; the repeating pattern does not appear. According to spec, it should add a null terminator of 7 bits, then pad with zeros to the next codeword, then pad with the repeating pattern.

For reference, the terminator should be all zeros of the following lengths:

  • 4 bits for non-micro codes
  • 3 bits for M1
  • 5 bits for M2
  • 7 bits for M3
  • 9 bits for M4

Also, for M1 and M3 codes, the final half-codeword should be all zeros and not contain the repeating pattern.

Shane32 avatar Apr 19 '25 14:04 Shane32