kaitai_struct icon indicating copy to clipboard operation
kaitai_struct copied to clipboard

Construct export tool (a compiler target)

Open arekbulski opened this issue 6 years ago • 150 comments

After investigating the issue a bit more closely (and I admit that I am far from implementing it...), I came to a conclusion that the Construct->Kaitai tool should be implemented on Construct side, but the Kaitai->Construct export tool should be implemented on your side. I will need some assisstance with it, namely with the compiler since I dont speak Scala at all. I can provide you with examples, what the compiler should spew out, but I probably wont be able to write Scala code for it.

I foresee one (admittedly huge problem): Construct API is not stable (at least at the moment). This seems to be mostly affecting classes that are brand new, but there were also some changes to classes that existed before I took over the project. If it would take Kaitai a year to ship changes, thats kind of a problem. I suppose the users could use the newest version from GitHub (build the compiler from sources), so that would somewhat mitigate if not solve the issue.

I attach the map of completion, to be edited later (updated manually): 38455241-3b627b64-3a6d-11e8-8fe3-19726c3ef877

Current translations and CI results (updated automatically): https://github.com/kaitai-io/ci_targets/tree/master/compiled/construct http://kaitai.io/ci/

arekbulski avatar Mar 12 '18 17:03 arekbulski

I suspect that Construct target would be very different from what's usually generated, i.e. we'll need to match declarative constructions, not procedural statements. That's totally possible too, though.

https://github.com/kaitai-io/kaitai_struct/issues/253 is progressing smoothly, so chance are it would be much easier to do all the tests after it would be completed. Actually, it can be used right now, we just need to add relevant code to generate tests, Python's translator probably should be fine as is?

GreyCat avatar Mar 12 '18 17:03 GreyCat

I will start working on both import/export tools then, and should deliver a first batch of translated examples within say a few days. Can I count on you to handle writing the compiler scala code?

To return the favor, I will get on top of some other work items that I assigned myself to and kind of had it stuck in the backlog, most importantly the Construct->Kaitai export tool which will be implemented entirely on Construct side. Actually its very similar to the compiler infrastructure, except the pyYAML module would be generating the output instead of a custom text-writer class.

arekbulski avatar Mar 13 '18 12:03 arekbulski

Can I count on you to handle writing the compiler scala code?

Sure :)

GreyCat avatar Mar 13 '18 18:03 GreyCat

Would it be more convenient if structs were created kind of imperatively like this?

s = []
s.append('fieldname' / Int8ub)
return Struct(*s)

instead of proper syntax?

Struct(
'fieldname' / Int8ub,
)

arekbulski avatar Mar 14 '18 15:03 arekbulski

Probably it won't matter, as this would be whole new Compiler, akin to GraphVizCompiler, which would not use existing templating mechanism (=does not need to fit into existing header-contents-footer workflow).

GreyCat avatar Mar 14 '18 15:03 GreyCat

Could the construct-compiler (lets call it the export tool?) have a weekly release cycle?

arekbulski avatar Mar 15 '18 17:03 arekbulski

I don't quite understand what you call "release cycle". At least in the near foreseeable future, "releasing" would probably remain relatively time-consuming and thus not very common task — mostly because it means lots of PR, writing announces, posting announces on the news sites, etc.

On the other hand, if you just want current builds, they're available right now, both for Debian & Windows. Would that qualify for "wait for 5-7 minutes after the commit" release cycle?

GreyCat avatar Mar 16 '18 00:03 GreyCat

Yes that is exactly what I meant. So this solves the issue of Construct having unstable API.

arekbulski avatar Mar 16 '18 10:03 arekbulski

Progress report on the import tool:

    def test_exportksy():
        d = Struct(
            "num1" / BytesInteger(4),
            "num2" / Int32ub,
            "num2" / Float32b,
            "data1" / Bytes(4),
            "data2" / GreedyBytes,
            "array2d" / Array(5, Array(5, BytesInteger(1)))
        )
        d.export_ksy()
--------------------------------------- Captured stdout call ----------------------------------------
meta:
  id: unnamed_schema
seq:
- id: num1
  type: u4be
- id: num2
  type: u4be
- id: num2
  type: f4be
- id: data1
  size: 4
- id: data2
  size-eos: true
- id: array2d
  repeat: expr
  repeat-expr: 5
  type: type_0
types:
  type_0:
    seq:
    - id: x
      repeat: expr
      repeat-expr: 5
      type: u1be

arekbulski avatar Mar 16 '18 21:03 arekbulski

Switched from pyYAML to ruamelYAML, it fixed the order of keys.

arekbulski avatar Mar 16 '18 22:03 arekbulski

Would that be valid analog to Prefixed?

- id: lengthfield
type: u4le
- id: data
size: lengthfield
type: xxx

arekbulski avatar Mar 16 '18 22:03 arekbulski

If that's going to be just a byte array, just drop type: xxx, that would be raw byte array.

GreyCat avatar Mar 16 '18 22:03 GreyCat

No, that would be Bytes analog, I am asking about Prefixed. https://construct.readthedocs.io/en/latest/api/tunneling.html#construct.Prefixed

arekbulski avatar Mar 16 '18 22:03 arekbulski

Um, then just use what you've proposed — it should work ;)

GreyCat avatar Mar 16 '18 22:03 GreyCat

Do I get it right, that fields can have doc tag but entire struct cannot?

arekbulski avatar Mar 16 '18 22:03 arekbulski

Typespec (that's probably what you call "entire struct") can have doc + doc-ref.

GreyCat avatar Mar 16 '18 22:03 GreyCat

I was thinking of something on meta level but closest thing I found was title. Typespec would I think be a nested struct, not outer-most struct.

arekbulski avatar Mar 16 '18 22:03 arekbulski

Uhm... what would be the analog of Pass? https://construct.readthedocs.io/en/latest/api/streaming.html#construct.Pass

arekbulski avatar Mar 16 '18 22:03 arekbulski

Probably there won't be direct equivalent. Default case for switch is _. There is no "default case" for enums, as (1) they work pretty different from Construct implementation, i.e. then don't convert integers <-> strings, but integers <-> constants, (2) currently their implementation is language-dependent.

GreyCat avatar Mar 17 '18 11:03 GreyCat

Would size: 0 be a valid attributespec?

arekbulski avatar Mar 17 '18 15:03 arekbulski

From KS point of view, definitely yes. From individual languages point of view, I'm not sure that all of them allow zero-sized arrays, but probably most do.

GreyCat avatar Mar 17 '18 15:03 GreyCat

@GreyCat The docs say that AttributeSpec tags "must" come in specified order. Is that really a hard requirement or just a way of emphasizing the importance of a style guide?

arekbulski avatar Mar 17 '18 21:03 arekbulski

"Style guide" is called style guide for a reason of enforcing certain style: where several different behaviors are technically possible, we suggest one to follow. If someone does not do what style guide suggests, it fails to comply with style guide, but the code could be still compilable. If someone does not do what language reference dictates, most likely it will result in compilation error.

So, the short answer: tags inside attribute spec can come in any order. Compiler actually has no way to even know of that order, it just gets it as unordered map.

GreyCat avatar Mar 17 '18 21:03 GreyCat

Alright, for the record this is what the style guide says: "When specifying an attribute, one MUST use the following order of keys"

arekbulski avatar Mar 17 '18 22:03 arekbulski

Exactly. That "MUST" is to be interpreted as "in order to make style-compliant ksy, you must do that".

GreyCat avatar Mar 17 '18 22:03 GreyCat

I guess the order should be user-controllable by passing 3 lists, one is for properties (in seq, instances, enums, params), one is for types, and one is for meta. For example ["id", "-orig-id", "type", ...] if we wanna comply with the style guide.

KOLANICH avatar Mar 18 '18 06:03 KOLANICH

I'm not sure that style compliance is worth pursuing, at least now. For example, PyYAML generates arrays-in-maps like that:

seq:
- id: foo
  type: bar

while style guide suggests:

seq:
  - id: foo
    type: bar

GreyCat avatar Mar 18 '18 09:03 GreyCat

It's not about style complience much, it's just inconvenient when the things go in another order or even in random order from run to run (this will have especially nasty effect on diffs and testing). That's why the order is specified in style guide, I guess. I mean though we may have some troubles with complying with the style guide, we need have an order, any order convenient for user, just for convenience.

BTW, ruamel.YAML used now allows the style prescribed by the style guide.

KOLANICH avatar Mar 18 '18 10:03 KOLANICH

I agree that non-deterministic order would be problematic for diffs. The tool exports keys in deterministic order, and also style-compliant as far as I seen it. This issue is considered solved.

I will fix the indentation, thanks to @KOLANICH .

arekbulski avatar Mar 18 '18 13:03 arekbulski

What is the analog of Flag (boolean), u1 and b1?

arekbulski avatar Mar 18 '18 13:03 arekbulski