RecordFlux icon indicating copy to clipboard operation
RecordFlux copied to clipboard

Implement simulator

Open senier opened this issue 3 years ago • 0 comments

Background

Goals:

  • Replace PyRFLX

  • From a RecordFlux specification

    • Parse messages easily
    • Generate messages
    • Run sessions
  • Pythonic / natural implementation

    • Strict checking of invariants nonetheless
    • Suitable for typed Python
    • All values that can be calculated are set automatically
    • Values that are set automatically cannot be set manually
    • Code should have no style issues (e.g. pylint)

Related Work - Message Parsing / Generation

https://github.com/Componolit/rflx_simulator_experiments

Examples

(tests/data/specs/tlv.rflx)

package TLV is

   type Tag is (Msg_Data => 1, Msg_Error => 3) with Size => 8;
   type Length is range 0 .. 2 ** 16 - 1 with Size => 16;

   type Message is
      message
         Tag : Tag
            then Length
               if Tag = Msg_Data
            then null
               if Tag = Msg_Error;
         Length : Length
            then Value
               with Size => Length * 8;
         Value : Opaque;
      end message;
end TLV;

Test Data - Msg_Data

01 00 02 de ad

Test Data - Msg_Error

02

Test Data - Invalid Tag

03

Construct

https://github.com/construct/construct

Specification

CONSTRUCT_TLV = construct.Struct(
   "tag" / construct.Enum(construct.Int8ub, MSG_DATA=1, MSG_ERROR=2),
   construct.StopIf(this.tag != 1),
   "length" / construct.Int16ub,
   "value" / construct.Bytes(this.length)
)

This is probably wrong - it does not really make "length" and "value" optional as I would have expected. The project does have significant documentation, but it often does not cover the more complicated cases.

Parsing

result = CONSTRUCT_TLV.parse(b"\x01\x00\x02\xde\xad")
assert result.tag == CONSTRUCT_TLV.tag.MSG_DATA

# The specification does not parse correctly!
# assert result.length == 2
# assert result.value == b"\xde\xad"

Generation

assert (
   CONSTRUCT_TLV.build({"tag": 1, "length": 2, "value": b"\xde\xad"}
   == b"\x01\x00\x02\xde\xad
)

Hachoir

https://github.com/vstinner/hachoir

Specification

class HachoirTLV(Parser):
   tag_types = {
      1: "Msg_Data",
      3: "Msg_Error"
   }

   endian = hachoir.stream.BIG_ENDIAN

   def createFields(self):
      yield hachoir.field.Enum(
         hachoir.field.UInt8(self, "tag", "Tag"), self.tag_types
   )

   if self["tag"].value == 1:
      yield hachoir.field.UInt16(self, "length", "Length")
      yield hachoir.field.Bytes(self, "value", self["length"].value)

Parsing

tlv = HachoirTLV(hachoir.stream.StringInputStream(TEST_DATA_DATA))
assert tlv["tag"].value == 1
assert tlv["length"].value == 2
assert tlv["value"].value == b'\xde\xad'

Generation

Not supported (there is an editor module to change parsed data, though).

Kaitai Struct

https://kaitai.io/

Specification

meta:
   id: tlv
   endian: be
seq:
   - id: tag
     type: u1
     enum: tag
   - id: len_value
     type: u2
     if: tag == tag::data
   - id: value
     size: len_value
     if: tag == tag::data
enums:
   tag:
     1: data
     3: error

The specification is translated to Python code using the Kaitai struct compiler (ksc):

$ ksc --target python tlv.ksy

The resulting tlv.py file contains the parser. A support library (kaitaistruct) is required for it to work.

Parsing

tlv = kaitai.Tlv.from_bytes(TEST_DATA_DATA)
assert tlv.tag == tlv.tag.data
assert tlv.len_value == 2
assert tlv.value == b'\xde\xad'

Generation

Not supported

Python Suitcase

https://github.com/digidotcom/python-suitcase

Specification

class SuitcaseTLV(Structure):
   tag = suitcase.fields.UBInt8()
   length = suitcase.fields.ConditionalField(
      suitcase.fields.LengthField(suitcase.fields.UBInt16()),
      lambda m: m.tag == 1
   )
   value = suitcase.fields.ConditionalField(
      suitcase.fields.Payload(length),
      lambda m: m.tag == 1
   )

Parsing

tlv = SuitcaseTLV()
tlv.unpack(TEST_DATA_DATA)
assert tlv.tag == 1
assert tlv.length == 2
assert tlv.value == b'\xde\xad'

Generation

tlv = SuitcaseTLV()
tlv.tag = 1

# Length field is calculated automatically
# When trying to set it, we get:
# suitcase.exceptions.SuitcaseProgrammingError:
# Cannot set the value of a LengthField
# tlv.length = 2

tlv.value = b'\xde\xad'
assert tlv.pack() == TEST_DATA_DATA

Scapy

https://scapy.net/

Specification

class ScapyTLV(scapy.Packet):
    fields_desc = [
        scapy.ByteEnumField(
            "tag",
            0,
            {
                 1: "DATA",
                 2: "ERROR"
            }
        ),
        scapy.ConditionalField(
            scapy.FieldLenField("len", None, length_of="value"),
            lambda pkt: pkt.tag == 1
        ),
        scapy.ConditionalField(
            scapy.StrLenField(
                "Value",
                "",
                length_from=lambda pkt: pkt.len
            ),
            lambda pkt: pkt.tag == 1
       )
    ]

Parsing

result = ScapyTLV(TEST_DATA_DATA)
assert result.tag == 1
assert result.len == 2
assert result.value == b'\xde\xad'

Generation

result = ScapyTLV(tag = 1, value = b'\xde\xad')
assert scapy.raw(result) == TEST_DATA_DATA

Message Parser Design

Option 1.1

simulator = rflx.Simulator("tlv.rflx").tlv.message
message = simulator.parse(data)

Pro: Natural interface Con: Static typing may be impossible / hard

Option 1.2

message = rflx.Simulator("tlv.rflx", ["TLV", "Message"], data)

Pro: Short, easier / more natural to be used programmatically Con: Generation is asymmetric, no reuse of same model with different data

Option 1.3

message = rflx.Simulator("tlv.rflx").tlv.message
message.unpack(data)

Pro: Symmetric interface possible, reuse of previously parsed message Con: Stateful

Option 1.4

simulator = rflx.Simulator("tlv.rflx")["TLV"]["Message"]
message = simulator.parse(data)

Pro: easier / more natural to be used programmatically Con: Static typing may be impossible / hard

Option 1.5

simulator = rflx.Simulator("tlv.rflx")
message = simulator.tlv.message(checksum=lambda x: crc(x))
message.parse(data)

Pro: Place where checksum functions (and later parameters) are passed is consistent with specification, package hierarchy in the spec is mirrored by the code Con: Checksum needs to passed whenever message is constructed

Option 1.6

@rflx.simulator.from_file("tlv.rflx")
class MySimulator(rflx.simulator.Simulator):
    def tlv_message_checksum(x: int) -> int:
        return crc(x)

simulator = MySimulator()
simulator.tlv.message.parse(data)

Alternative version with inline specification:

@rflx.simulator.from_string(
"""
package TLV is

   type Tag is (Msg_Data => 1, Msg_Error => 3) with Size => 8;
   type Length is range 0 .. 2 ** 16 - 1 with Size => 16;

   type Message is
      message
         Tag : Tag
            then Length
               if Tag = Msg_Data
            then null
               if Tag = Msg_Error;
         Length : Length
            then Value
               with Size => Length * 8;
         Value : Opaque;
      end message;
end TLV;
"""
)
class MySimulator(rflx.simulator.Simulator):
    def tlv_message_checksum(x: int) -> int:
        return crc(x)

Pro: Can be statically type-checked by mypy, central place for checksum functions, different child classes associated with different specs Con: Name mangling of checksum function may become confusing, more code

Conclusion

~~Implement 1.5~~ (mypy does not provide hooks necessary to check this version) Implement 1.6

Data Getter Design

Option 2.1

tag = message.tag
value = message.value

Pro: Better readability, natural to use Con: translation from RecordFlux names to Python necessary to avoid style check issues

Option 2.2

tag = message["Tag"]
value = message["Value"]

Pro: Field names identical to spec, iteration over fields could be implemented on top Con: Verbose

Option 2.3

tag = message.get("Tag")
value = message.get("Value")

Pro: Field names identical to spec Con: Verbose

Conclusion

Implement 2.1

Data Setter Design

Option 3.1

message.tag = tag
message.value = value

Pro: Better readability, natural to use Con: translation from RecordFlux names to Python necessary to avoid style check issues

Option 3.2

message["Tag"] = tag
message["Value"] = value

Pro: Field names identical to spec, iteration over fields could be implemented on top Con: Verbose, calls need to be in right order

Option 3.3

message.set ("Tag", tag)
message.set ("Value", value)

Pro: Field names identical to spec Con: Verbose, calls need to be in right order

Option 3.4

message.set ({ "Tag": tag, "Value": value })

Pro: Field names identical to spec, great flexibility Con: Partial update may not be possible

~~Option 3.5~~

~~message = { "Tag": value }~~

Assignment cannot be overloaded in Python

Conclusion

Implement 3.1

Message Serializer Design

Option 4.1

data = message.serialize()

Pro: Natural interface Con:

Option 4.2

data = bytes(message)

Pro: Natural interface, very pythonic Con:

Conclusion

4.2

Checksum Design

Option 5.1

message.checksum = lambda x: crc(x)

Pro: Natural interface Con: Must be set per message, not suitable for parsing

Option 5.2

simulator = rflx.Simulator(
                "Tlv.rflx",
                checksums={
                    'TLV': {
                         'Message': {
                               'checksum': lambda x: crc(x)
                         }
                    }
                }
            ).tlv.message

message = simulator.parse(data)

Pro: Checksum only set per simulator instance Con: TLV.Message addressed in two distinct places / ways

Option 5.3

Cf. 1.5

Conclusion

5.3

Enumeration Literals Design

Option 6.1

if message.tag == 3:
   pass

Pro: Compatible with int Con: Error prone, user needs to perform mapping from enum to integer manually

Option 6.2

if message.tag == simulator.tlv.msg_error:
   pass

Pro: Compatible with int (when based on IntEnum) Con:

Conclusion

6.2

Summary - Message Parser

@rflx.simulator.from_file("tlv.rflx")
class MySimulator(rflx.simulator.Simulator):
    def tlv_message_checksum(x: int) -> int:
        return crc(x)

simulator = MySimulator()
simulator.tlv.message.parse(data)

tag = simulator.tlv.message.tag
value = simulator.tlv.message.value

if tag == simulator.tlv.msg_error:
   simulator.tlv.message.tag = new_tag

socket.send(bytes(simulator.tlv.message))

Related Work - State Machines

Pysmlib

https://darcato.github.io/pysmlib/docs/html/index.html

FiniteStateMachines

https://github.com/jaypantone/FiniteStateMachines

PythonStateMachine

https://python-statemachine.readthedocs.io/en/latest/index.html

Transitions

https://github.com/pytransitions/transitions

PySM

https://pysm.readthedocs.io/en/latest/#

StateEngine

https://github.com/aymanimtyaz/StateEngine

senier avatar Dec 09 '21 11:12 senier