kaitai_struct
kaitai_struct copied to clipboard
Construct export tool (a compiler target)
After investigating the issue a bit more closely (and I admit that I am far from implementing it...), I came to a conclusion that the Construct->Kaitai tool should be implemented on Construct side, but the Kaitai->Construct export tool should be implemented on your side. I will need some assisstance with it, namely with the compiler since I dont speak Scala at all. I can provide you with examples, what the compiler should spew out, but I probably wont be able to write Scala code for it.
I foresee one (admittedly huge problem): Construct API is not stable (at least at the moment). This seems to be mostly affecting classes that are brand new, but there were also some changes to classes that existed before I took over the project. If it would take Kaitai a year to ship changes, thats kind of a problem. I suppose the users could use the newest version from GitHub (build the compiler from sources), so that would somewhat mitigate if not solve the issue.
I attach the map of completion, to be edited later (updated manually):
Current translations and CI results (updated automatically): https://github.com/kaitai-io/ci_targets/tree/master/compiled/construct http://kaitai.io/ci/
I suspect that Construct target would be very different from what's usually generated, i.e. we'll need to match declarative constructions, not procedural statements. That's totally possible too, though.
https://github.com/kaitai-io/kaitai_struct/issues/253 is progressing smoothly, so chance are it would be much easier to do all the tests after it would be completed. Actually, it can be used right now, we just need to add relevant code to generate tests, Python's translator probably should be fine as is?
I will start working on both import/export tools then, and should deliver a first batch of translated examples within say a few days. Can I count on you to handle writing the compiler scala code?
To return the favor, I will get on top of some other work items that I assigned myself to and kind of had it stuck in the backlog, most importantly the Construct->Kaitai export tool which will be implemented entirely on Construct side. Actually its very similar to the compiler infrastructure, except the pyYAML module would be generating the output instead of a custom text-writer class.
Can I count on you to handle writing the compiler scala code?
Sure :)
Would it be more convenient if structs were created kind of imperatively like this?
s = []
s.append('fieldname' / Int8ub)
return Struct(*s)
instead of proper syntax?
Struct(
'fieldname' / Int8ub,
)
Probably it won't matter, as this would be whole new Compiler
, akin to GraphVizCompiler
, which would not use existing templating mechanism (=does not need to fit into existing header-contents-footer workflow).
Could the construct-compiler (lets call it the export tool?) have a weekly release cycle?
I don't quite understand what you call "release cycle". At least in the near foreseeable future, "releasing" would probably remain relatively time-consuming and thus not very common task — mostly because it means lots of PR, writing announces, posting announces on the news sites, etc.
On the other hand, if you just want current builds, they're available right now, both for Debian & Windows. Would that qualify for "wait for 5-7 minutes after the commit" release cycle?
Yes that is exactly what I meant. So this solves the issue of Construct having unstable API.
Progress report on the import tool:
def test_exportksy():
d = Struct(
"num1" / BytesInteger(4),
"num2" / Int32ub,
"num2" / Float32b,
"data1" / Bytes(4),
"data2" / GreedyBytes,
"array2d" / Array(5, Array(5, BytesInteger(1)))
)
d.export_ksy()
--------------------------------------- Captured stdout call ----------------------------------------
meta:
id: unnamed_schema
seq:
- id: num1
type: u4be
- id: num2
type: u4be
- id: num2
type: f4be
- id: data1
size: 4
- id: data2
size-eos: true
- id: array2d
repeat: expr
repeat-expr: 5
type: type_0
types:
type_0:
seq:
- id: x
repeat: expr
repeat-expr: 5
type: u1be
Switched from pyYAML to ruamelYAML, it fixed the order of keys.
Would that be valid analog to Prefixed?
- id: lengthfield
type: u4le
- id: data
size: lengthfield
type: xxx
If that's going to be just a byte array, just drop type: xxx
, that would be raw byte array.
No, that would be Bytes analog, I am asking about Prefixed. https://construct.readthedocs.io/en/latest/api/tunneling.html#construct.Prefixed
Um, then just use what you've proposed — it should work ;)
Do I get it right, that fields can have doc
tag but entire struct cannot?
Typespec (that's probably what you call "entire struct") can have doc
+ doc-ref
.
I was thinking of something on meta
level but closest thing I found was title
.
Typespec would I think be a nested struct, not outer-most struct.
Uhm... what would be the analog of Pass? https://construct.readthedocs.io/en/latest/api/streaming.html#construct.Pass
Probably there won't be direct equivalent. Default case for switch is _
. There is no "default case" for enums, as (1) they work pretty different from Construct implementation, i.e. then don't convert integers <-> strings, but integers <-> constants, (2) currently their implementation is language-dependent.
Would size: 0
be a valid attributespec?
From KS point of view, definitely yes. From individual languages point of view, I'm not sure that all of them allow zero-sized arrays, but probably most do.
@GreyCat The docs say that AttributeSpec tags "must" come in specified order. Is that really a hard requirement or just a way of emphasizing the importance of a style guide?
"Style guide" is called style guide for a reason of enforcing certain style: where several different behaviors are technically possible, we suggest one to follow. If someone does not do what style guide suggests, it fails to comply with style guide, but the code could be still compilable. If someone does not do what language reference dictates, most likely it will result in compilation error.
So, the short answer: tags inside attribute spec can come in any order. Compiler actually has no way to even know of that order, it just gets it as unordered map.
Alright, for the record this is what the style guide says: "When specifying an attribute, one MUST use the following order of keys"
Exactly. That "MUST" is to be interpreted as "in order to make style-compliant ksy, you must do that".
I guess the order should be user-controllable by passing 3 list
s, one is for properties (in seq
, instances
, enums
, params
), one is for types, and one is for meta
.
For example ["id", "-orig-id", "type", ...]
if we wanna comply with the style guide.
I'm not sure that style compliance is worth pursuing, at least now. For example, PyYAML generates arrays-in-maps like that:
seq:
- id: foo
type: bar
while style guide suggests:
seq:
- id: foo
type: bar
It's not about style complience much, it's just inconvenient when the things go in another order or even in random order from run to run (this will have especially nasty effect on diffs and testing). That's why the order is specified in style guide, I guess. I mean though we may have some troubles with complying with the style guide, we need have an order, any order convenient for user, just for convenience.
BTW, ruamel.YAML
used now allows the style prescribed by the style guide.
I agree that non-deterministic order would be problematic for diffs. The tool exports keys in deterministic order, and also style-compliant as far as I seen it. This issue is considered solved.
I will fix the indentation, thanks to @KOLANICH .
What is the analog of Flag (boolean), u1 and b1?