RecordFlux
RecordFlux copied to clipboard
Implement simulator
Background
Goals:
-
Replace PyRFLX
-
From a RecordFlux specification
- Parse messages easily
- Generate messages
- Run sessions
-
Pythonic / natural implementation
- Strict checking of invariants nonetheless
- Suitable for typed Python
- All values that can be calculated are set automatically
- Values that are set automatically cannot be set manually
- Code should have no style issues (e.g. pylint)
Related Work - Message Parsing / Generation
https://github.com/Componolit/rflx_simulator_experiments
Examples
(tests/data/specs/tlv.rflx)
package TLV is
type Tag is (Msg_Data => 1, Msg_Error => 3) with Size => 8;
type Length is range 0 .. 2 ** 16 - 1 with Size => 16;
type Message is
message
Tag : Tag
then Length
if Tag = Msg_Data
then null
if Tag = Msg_Error;
Length : Length
then Value
with Size => Length * 8;
Value : Opaque;
end message;
end TLV;
Test Data - Msg_Data
01 00 02 de ad
Test Data - Msg_Error
02
Test Data - Invalid Tag
03
Construct
https://github.com/construct/construct
Specification
CONSTRUCT_TLV = construct.Struct(
"tag" / construct.Enum(construct.Int8ub, MSG_DATA=1, MSG_ERROR=2),
construct.StopIf(this.tag != 1),
"length" / construct.Int16ub,
"value" / construct.Bytes(this.length)
)
This is probably wrong - it does not really make "length" and "value" optional as I would have expected. The project does have significant documentation, but it often does not cover the more complicated cases.
Parsing
result = CONSTRUCT_TLV.parse(b"\x01\x00\x02\xde\xad")
assert result.tag == CONSTRUCT_TLV.tag.MSG_DATA
# The specification does not parse correctly!
# assert result.length == 2
# assert result.value == b"\xde\xad"
Generation
assert (
CONSTRUCT_TLV.build({"tag": 1, "length": 2, "value": b"\xde\xad"}
== b"\x01\x00\x02\xde\xad
)
Hachoir
https://github.com/vstinner/hachoir
Specification
class HachoirTLV(Parser):
tag_types = {
1: "Msg_Data",
3: "Msg_Error"
}
endian = hachoir.stream.BIG_ENDIAN
def createFields(self):
yield hachoir.field.Enum(
hachoir.field.UInt8(self, "tag", "Tag"), self.tag_types
)
if self["tag"].value == 1:
yield hachoir.field.UInt16(self, "length", "Length")
yield hachoir.field.Bytes(self, "value", self["length"].value)
Parsing
tlv = HachoirTLV(hachoir.stream.StringInputStream(TEST_DATA_DATA))
assert tlv["tag"].value == 1
assert tlv["length"].value == 2
assert tlv["value"].value == b'\xde\xad'
Generation
Not supported (there is an editor module to change parsed data, though).
Kaitai Struct
Specification
meta:
id: tlv
endian: be
seq:
- id: tag
type: u1
enum: tag
- id: len_value
type: u2
if: tag == tag::data
- id: value
size: len_value
if: tag == tag::data
enums:
tag:
1: data
3: error
The specification is translated to Python code using the Kaitai struct compiler (ksc):
$ ksc --target python tlv.ksy
The resulting tlv.py file contains the parser. A support library (kaitaistruct) is required for it to work.
Parsing
tlv = kaitai.Tlv.from_bytes(TEST_DATA_DATA)
assert tlv.tag == tlv.tag.data
assert tlv.len_value == 2
assert tlv.value == b'\xde\xad'
Generation
Python Suitcase
https://github.com/digidotcom/python-suitcase
Specification
class SuitcaseTLV(Structure):
tag = suitcase.fields.UBInt8()
length = suitcase.fields.ConditionalField(
suitcase.fields.LengthField(suitcase.fields.UBInt16()),
lambda m: m.tag == 1
)
value = suitcase.fields.ConditionalField(
suitcase.fields.Payload(length),
lambda m: m.tag == 1
)
Parsing
tlv = SuitcaseTLV()
tlv.unpack(TEST_DATA_DATA)
assert tlv.tag == 1
assert tlv.length == 2
assert tlv.value == b'\xde\xad'
Generation
tlv = SuitcaseTLV()
tlv.tag = 1
# Length field is calculated automatically
# When trying to set it, we get:
# suitcase.exceptions.SuitcaseProgrammingError:
# Cannot set the value of a LengthField
# tlv.length = 2
tlv.value = b'\xde\xad'
assert tlv.pack() == TEST_DATA_DATA
Scapy
Specification
class ScapyTLV(scapy.Packet):
fields_desc = [
scapy.ByteEnumField(
"tag",
0,
{
1: "DATA",
2: "ERROR"
}
),
scapy.ConditionalField(
scapy.FieldLenField("len", None, length_of="value"),
lambda pkt: pkt.tag == 1
),
scapy.ConditionalField(
scapy.StrLenField(
"Value",
"",
length_from=lambda pkt: pkt.len
),
lambda pkt: pkt.tag == 1
)
]
Parsing
result = ScapyTLV(TEST_DATA_DATA)
assert result.tag == 1
assert result.len == 2
assert result.value == b'\xde\xad'
Generation
result = ScapyTLV(tag = 1, value = b'\xde\xad')
assert scapy.raw(result) == TEST_DATA_DATA
Message Parser Design
Option 1.1
simulator = rflx.Simulator("tlv.rflx").tlv.message
message = simulator.parse(data)
Pro: Natural interface Con: Static typing may be impossible / hard
Option 1.2
message = rflx.Simulator("tlv.rflx", ["TLV", "Message"], data)
Pro: Short, easier / more natural to be used programmatically Con: Generation is asymmetric, no reuse of same model with different data
Option 1.3
message = rflx.Simulator("tlv.rflx").tlv.message
message.unpack(data)
Pro: Symmetric interface possible, reuse of previously parsed message Con: Stateful
Option 1.4
simulator = rflx.Simulator("tlv.rflx")["TLV"]["Message"]
message = simulator.parse(data)
Pro: easier / more natural to be used programmatically Con: Static typing may be impossible / hard
Option 1.5
simulator = rflx.Simulator("tlv.rflx")
message = simulator.tlv.message(checksum=lambda x: crc(x))
message.parse(data)
Pro: Place where checksum functions (and later parameters) are passed is consistent with specification, package hierarchy in the spec is mirrored by the code Con: Checksum needs to passed whenever message is constructed
Option 1.6
@rflx.simulator.from_file("tlv.rflx")
class MySimulator(rflx.simulator.Simulator):
def tlv_message_checksum(x: int) -> int:
return crc(x)
simulator = MySimulator()
simulator.tlv.message.parse(data)
Alternative version with inline specification:
@rflx.simulator.from_string(
"""
package TLV is
type Tag is (Msg_Data => 1, Msg_Error => 3) with Size => 8;
type Length is range 0 .. 2 ** 16 - 1 with Size => 16;
type Message is
message
Tag : Tag
then Length
if Tag = Msg_Data
then null
if Tag = Msg_Error;
Length : Length
then Value
with Size => Length * 8;
Value : Opaque;
end message;
end TLV;
"""
)
class MySimulator(rflx.simulator.Simulator):
def tlv_message_checksum(x: int) -> int:
return crc(x)
Pro: Can be statically type-checked by mypy, central place for checksum functions, different child classes associated with different specs Con: Name mangling of checksum function may become confusing, more code
Conclusion
~~Implement 1.5~~ (mypy does not provide hooks necessary to check this version) Implement 1.6
Data Getter Design
Option 2.1
tag = message.tag
value = message.value
Pro: Better readability, natural to use Con: translation from RecordFlux names to Python necessary to avoid style check issues
Option 2.2
tag = message["Tag"]
value = message["Value"]
Pro: Field names identical to spec, iteration over fields could be implemented on top Con: Verbose
Option 2.3
tag = message.get("Tag")
value = message.get("Value")
Pro: Field names identical to spec Con: Verbose
Conclusion
Implement 2.1
Data Setter Design
Option 3.1
message.tag = tag
message.value = value
Pro: Better readability, natural to use Con: translation from RecordFlux names to Python necessary to avoid style check issues
Option 3.2
message["Tag"] = tag
message["Value"] = value
Pro: Field names identical to spec, iteration over fields could be implemented on top Con: Verbose, calls need to be in right order
Option 3.3
message.set ("Tag", tag)
message.set ("Value", value)
Pro: Field names identical to spec Con: Verbose, calls need to be in right order
Option 3.4
message.set ({ "Tag": tag, "Value": value })
Pro: Field names identical to spec, great flexibility Con: Partial update may not be possible
~~Option 3.5~~
~~message = { "Tag": value }~~
Assignment cannot be overloaded in Python
Conclusion
Implement 3.1
Message Serializer Design
Option 4.1
data = message.serialize()
Pro: Natural interface Con:
Option 4.2
data = bytes(message)
Pro: Natural interface, very pythonic Con:
Conclusion
4.2
Checksum Design
Option 5.1
message.checksum = lambda x: crc(x)
Pro: Natural interface Con: Must be set per message, not suitable for parsing
Option 5.2
simulator = rflx.Simulator(
"Tlv.rflx",
checksums={
'TLV': {
'Message': {
'checksum': lambda x: crc(x)
}
}
}
).tlv.message
message = simulator.parse(data)
Pro: Checksum only set per simulator instance Con: TLV.Message addressed in two distinct places / ways
Option 5.3
Cf. 1.5
Conclusion
5.3
Enumeration Literals Design
Option 6.1
if message.tag == 3:
pass
Pro: Compatible with int Con: Error prone, user needs to perform mapping from enum to integer manually
Option 6.2
if message.tag == simulator.tlv.msg_error:
pass
Pro: Compatible with int (when based on IntEnum) Con:
Conclusion
6.2
Summary - Message Parser
@rflx.simulator.from_file("tlv.rflx")
class MySimulator(rflx.simulator.Simulator):
def tlv_message_checksum(x: int) -> int:
return crc(x)
simulator = MySimulator()
simulator.tlv.message.parse(data)
tag = simulator.tlv.message.tag
value = simulator.tlv.message.value
if tag == simulator.tlv.msg_error:
simulator.tlv.message.tag = new_tag
socket.send(bytes(simulator.tlv.message))
Related Work - State Machines
Pysmlib
https://darcato.github.io/pysmlib/docs/html/index.html
FiniteStateMachines
https://github.com/jaypantone/FiniteStateMachines
PythonStateMachine
https://python-statemachine.readthedocs.io/en/latest/index.html
Transitions
https://github.com/pytransitions/transitions
PySM
https://pysm.readthedocs.io/en/latest/#