python-suitcase icon indicating copy to clipboard operation
python-suitcase copied to clipboard

Support for variable-length, non-greedy Payload()

Open martinpelikan opened this issue 8 years ago • 1 comments
trafficstars

tl;dr: It would be nice if Payload() consumed data until the first None/null-character if no length is specified, rather than the last instance of it as it appears to currently be doing.

I'm running into a problem with a Structure containing two null-delimited strings. The protocol doesn't specify information about the length of these strings. I wrote my own protocol parser before discovery this library, so I already have the protocol broken down into logical segments/Structures, but I was hoping to get rid of my hack job in favour of using Suitcase to parse the individual fields.

Reading the docs, I see this is probably an explanation for my situation:

Parameters: length_provider – The LengthField with which this variable length payload is associated. If not included, it is assumed that the length_provider should consume the remainder of the bytes available in the string. This is only valid in cases where the developer knows that they will be dealing with a fixed sequence of bytes (already boxed).

What would be the best way to work within the framework with such a constraint? Is there any way to make Payload() be lazy rather than greedy?

success.py:

from suitcase.structure import Structure
from suitcase.fields import (
    SBInt8,
    SBInt16,
    UBInt32,
    SBInt64,
    Payload,
    Magic
)

class Header(Structure):
    a = UBInt32()
    b = SBInt8()
    c = Payload()
    e = SBInt64()
    f = SBInt16()
    g = SBInt16()
    h = SBInt16()
    i = SBInt16()

Success output:

In [1]: import success

In [2]: s = success.Header()

In [3]: s.unpack(b'\x00\x00\x007\n123.45.67.89-8888\x00Some-String-2\x00\x00\x00\x01Z\xa5\xfb\xc8\xab\x00\x0e\x00\x15\x00\x01\x00\x00')

In [4]: s
Out[4]: 
Header (
  a=55,
  b=10,
  c=b'123.45.67.89-8888\x00Some-String-2\x00',
  e=1488843425963,
  f=14,
  g=21,
  h=1,
  i=0,
)

failure.py:

from suitcase.structure import Structure
from suitcase.fields import (
    SBInt8,
    SBInt16,
    UBInt32,
    SBInt64,
    Payload,
    Magic
)

class Header(Structure):
    header_size = UBInt32()
    version = SBInt8()
    nis_id = Payload()
    msg_id = Payload()
    timestamp = SBInt64()
    event_size = SBInt16()
    job_discard_size = SBInt16()
    num_jobs = SBInt16()
    num_discards = SBInt16()

Failure output:

In [1]: import failure

In [2]: f = failure.Header()

In [3]: f.unpack(b'\x00\x00\x007\n123.45.67.89-8888\x00Some-String-2\x00\x00\x00\x01Z\xa5\xfb\xc8\xab\x00\x0e\x00\x15\x00\x01\x00\x00')
---------------------------------------------------------------------------
SuitcaseParseError                        Traceback (most recent call last)
<ipython-input-3-91ea443f72a0> in <module>()
----> 1 f.unpack(b'\x00\x00\x007\n123.45.67.89-8888\x00Some-String-2\x00\x00\x00\x01Z\xa5\xfb\xc8\xab\x00\x0e\x00\x15\x00\x01\x00\x00')

/home/mpelikan/.local/lib/python3.6/site-packages/suitcase/structure.py in unpack(self, data, trailing)
    339 
    340     def unpack(self, data, trailing=False):
--> 341         return self._packer.unpack(data, trailing)
    342 
    343     def pack(self):

/home/mpelikan/.local/lib/python3.6/site-packages/suitcase/structure.py in unpack(self, data, trailing)
     62     def unpack(self, data, trailing=False):
     63         stream = BytesIO(data)
---> 64         self.unpack_stream(stream)
     65         stream.tell()
     66         if trailing:

/home/mpelikan/.local/lib/python3.6/site-packages/suitcase/structure.py in unpack_stream(self, stream)
    150                                              "%r we tried to read %s bytes but "
    151                                              "we were only able to read %s." %
--> 152                                              (_name, length, len(data)))
    153                 try:
    154                     field.unpack(data)

SuitcaseParseError: While attempting to parse field 'd' we tried to read None bytes but we were only able to read 32.

martinpelikan avatar Mar 07 '17 01:03 martinpelikan

Since the two strings in my case are found in succession and can be read out by the same Payload() field, I've used a bit of a hack to work around this for now.

    ...
    _c_d = Payload()
    c = FieldProperty(
        _c_d, onget=lambda v: str(v.split(b'\x00')[0]))
    d = FieldProperty(
        _c_d, onget=lambda v: str(v.split(b'\x00')[1]))
    ...

I looked through the closed Issues and PRs, and noticed a similar request here: #21. In my case all of this data is within a fixed frame, but there are two unknown size Payload() fields, whereas in the case one of them is constrained to a fixed size which is what probably makes the test work...

martinpelikan avatar Mar 14 '17 21:03 martinpelikan