kaitai_struct
kaitai_struct copied to clipboard
terminator: support multi-byte termination bytes
In JPEG Interchange Format (including JFIF and SPIFF), the scan segment includes compressed data for which a length is not known until the compressed data has been fully read from the file. It is possible however to look for a 0xFF byte in the compressed data, which would be followed by 0x00 if this marker is to be ignored (escaped), or another byte (which can be multiple values) to denote the next segment of the file.
Ideally there would be a construct similar to:
- id: compressed_data
terminator:
- [0xFF, 0xAA] #next_marker_1
- [0xFF, 0xBB] #next_marker_2
consume: false
Wildcard bytes, regular expressions, number ranges and other helpers could also be of assistance in defining terminators in other file formats.
Assign this to me
I suggest that, instead of supporting multibytes as a terminator, generalize by supporting a rule as a terminator, so a multibyte constant sequence would be a particular case.
@GreyCat Can this be a temporal implementation until #538 is specified? If so, please assign this to me, since we need to finish the JPEG.
I have been working to handle multi-bytes terminator.
Let's suppose that the changes in Scala to change the type of terminator from int to Array[Byte] are made. (I just replace the int type for Array[Byte] and some minor changes but I would like to write a separate issue about that.)
One good thing to know is KMP algorithm to find matches in O(N + M) where N is the length of the pattern and M is the length of the text. Because of this the complexity of 'read_bytes_term()' doesn't change.
Here is the python-runtime commit with the changes: python-runtime
Any progress on this?