rollinghashcpp icon indicating copy to clipboard operation
rollinghashcpp copied to clipboard

Rolling CRC

Open cmcqueen opened this issue 6 years ago • 4 comments

It's quite possible to do an efficient rolling CRC. Let me know if you're interested in help with that. I could help with the mathematical theory, and practical implementations.

cmcqueen avatar Apr 14 '19 04:04 cmcqueen

Yes, Craig, I am interested.

lemire avatar Apr 14 '19 14:04 lemire

Okay, I'm working on a write-up.

cmcqueen avatar Apr 21 '19 11:04 cmcqueen

Outline

  • Refer to classic doc on CRCs: A Painless Guide to CRC Error Detection Algorithms, by Ross Williams
  • For the purposes of a rolling CRC, it is sensible to avoid doing any initial/final XOR as often done in classic CRC algorithms. That would just add unnecessary complexity for use in a rolling CRC.
  • Classically, CRCs have been described as calculating the remainder of binary long-division. Another way of describing it is, polynomial reduction using a reducing polynomial. Then CRCs can be thought of as operations in a Galois field such as GF(2^32), where the CRC polynomial is the reducing polynomial of the Galois Field.
  • A CRC "permutes" in each byte cycle by being multiplied by 0x100 and then reduced by the polynomial (ie modulo the polynomial in the Galois field). So a CRC value permuted through n bytes can be thought of as being multiplied by pow(0x100, n) modulo the polynomial in the Galois field.
  • CRCs combine linearly (using XOR). So
    • CRC(0x03) = CRC(0x01) XOR CRC(0x02)
    • CRC(0x01 0x02 0x03 0x04 0x05) = CRC(0x01 0x00 0x00 0x00 0x00) XOR CRC(0x00 0x02 0x03 0x04 0x05)
  • So when calculating a n-byte sliding window, the oldest byte can be "removed" from the CRC value by calculating the CRC of the byte "delayed" by n bytes.
  • A look-up table can be made of all 256 byte values "delayed" by n bytes, for an efficient "removal" of each oldest byte from the sliding window in a rolling CRC.
  • Because CRCs combine linearly, a 256-entry look-up table can be constructed by first calculating the CRC of the 8 1-bit values "delayed" by n bytes, then building each of the 256 entries by XORing the appropriate combination of the 8 values.

See

cmcqueen avatar Mar 11 '22 05:03 cmcqueen

+1

lemire avatar Mar 11 '22 13:03 lemire