kaitai_struct icon indicating copy to clipboard operation
kaitai_struct copied to clipboard

Enabling ecosystem growth?

Open woodrowbarlow opened this issue 2 years ago • 4 comments

I'm sure the Kaitai maintainers have noticed that there is a large pocket a Kaitai users, or devs who would be Kaitai users, who are not the original target audience (reverse engineers). These users are deploying their own binary formats and they want to use .ksy files as their format's source definition, and benefit from automatic tooling similar to how web developers use OpenAPI. You can see this faction in requests for features like automatic serialization and wireshark plugins.

I actually wrote a blog on this topic, and there was some good discussion in the hacker news comments linked at the bottom. https://anachronauts.club/~voidstar/log/2022-03-24-openapi-for-binfmt.gmi

At the moment, the only practical way to contribute to the ecosystem of tools that can work with ksy files is to submit a pull request to the Kaitai compiler itself. Creating an independent tool would mean re-doing much of the work (especially the pre-compile passes) that the compiler already does -- in every single tool. (Tools written in JVM languages can, at least, call into the Scala codebase. But other languages are out of luck.)

This creates a centralized bottleneck for ecosystem growth. Ideally, I could write my own tool in python that works with a .ksy to generate test cases or something... but unless I'm willing to re-implement a lot of the Kaitai compiler, I won't be able to determine the size of a user-defined type or evaluate expressions, etc..

Is there a plan to tackle this? E.g.:

  • Katai compiler adds a plugin API; launch plugins as a subprocess and communicate over a pipe so the plugin can be written in any language.
  • Some sort of two-stage process: .ksy files get compiled into something more verbose, and then those get compiled into various language targets from there.
  • The Kaitai core library gets written in a native compiled language, and packaged with thin FFI wrappers in a myriad of languages so that devs in many languages can write tools that use the compiler.
  • Maybe something like gRPC's reflection protocol?

I'm just interested to know if something like this is on the team's radar / roadmap, or even compatible with the vision.

woodrowbarlow avatar Mar 30 '22 00:03 woodrowbarlow

The biggest problem at the moment is that PRs just don't reviewed by maintainers and not merged... which means that community is effectively dead.

I've start to rewriting compiler in Rust with more practical licence, but there is still a lot of work. I'll do not wish to split the community and introduce KSY language dialects, but the way how some things are implemented makes me sad. Supporting full compatibility with the original compiler just complicates things for no reason and kills my motivation. And effectively dead community do not allow me to change anything in the original compiler.

I'll plan to make a reusable compiler and even now some parts are good enough (parser module), I even use it in one of my toy projects to generate KSY. I also have a prototype for Java target, but it was never published because it is in very early stage. Original compiler are very complicated, that forcing the user to create an implementation of 3-4-5 classes with a not very understandable purposes. Having a clear interface (just give me a checked KSY AST) will simplify things a lot, which I have in mind when started my project.

I think, that is what we really needed. IMHO, adding plugins support to an original compiler would be buggy and complicated thing, it is easier to just rewrite the whole compiler from scratch what I was convinced about working on my project. At the same time, it is also reasonable to correct the shortcomings of the KSY specification, on which I am currently stuck, since these corrections are simply not discussed.

Mingun avatar Mar 30 '22 05:03 Mingun

Ah, I recognize that blog post! Hi 🙂

I would say that the target audience and goals of Kaitai Struct are actually much broader than you think they are 🙂 Reverse engineering is certainly one of the main use cases, but not the only one - Kaitai Struct also works very well for parsing known and documented data formats. Using a Kaitai Struct spec as the primary specification of a binary format is also definitely possible (and I think people have done so already, but I can't remember any concrete examples right now). Serialization has also been a goal from the beginning as far as I know (that is one of the reasons why Kaitai Struct is a declarative language), it simply hasn't been implemented yet. (Although there is an incomplete implementation targeting Java, which supports most simple features already.)

I agree that it's currently a bit difficult to create new tools that work with KSYs, because there's no easy way to extend/hook into the compiler, so you have to implement expression parsing, name resolution, type checking, imports, etc. yourself.

(Tools written in JVM languages can, at least, call into the Scala codebase. But other languages are out of luck.)

There is Scala Native, which can compile Scala code to a native binary and allows calling native code from Scala. Calling Scala code from native code is not supported yet, but is apparently being worked on (see scala-native/scala-native#155), so that may be an option in the future. That said, it seems that KSC currently would not be fully compatible with Scala Native - see #519.

  • Katai compiler adds a plugin API; launch plugins as a subprocess and communicate over a pipe so the plugin can be written in any language.

This would definitely be possible, but would probably require quite a bit of work in the compiler. Unfortunately, such larger changes are unlikely to happen anytime soon - as already mentioned, we have a bit of a maintainer bandwidth problem 🙁 I actually have commit rights myself, but don't have that much time to work on the project, thanks to having a full-time job now (first-world problems...). Plus, I don't know the compiler codebase well enough to confidently review (or make) major changes to the compiler. @GreyCat and @generalmimon know the compiler better, but also only have limited time.

  • Some sort of two-stage process: .ksy files get compiled into something more verbose, and then those get compiled into various language targets from there.

This would probably be the most realistic option - add a compiler target that outputs the already parsed/analyzed structure of the input specs in a JSON format or something. This would of course also require changes to the compiler, but would be much less work than a full plugin API.

  • The Kaitai core library gets written in a native compiled language, and packaged with thin FFI wrappers in a myriad of languages so that devs in many languages can write tools that use the compiler.

This is almost guaranteed to not happen, at least not officially by the Kaitai project. Of course people are free to reimplement KSC themselves in other languages (and in fact multiple people are already doing so, including @Mingun above). Though I personally don't think rewrites like this are worth the effort - everyone has their own favorite language ecosystem, each with its own advantages, and switching from one to the other is not going to hugely improve Kaitai Struct for the average user.

Actually, Scala is already a relatively good language in this regard, because it can be compiled to the JVM, JavaScript, and native code. So if the main complaint is that Kaitai Struct cannot be called from native code, IMO it would be more productive to work on supporting Scala Native rather than rewriting the compiler entirely.

dgelessus avatar Mar 30 '22 12:03 dgelessus

I'm interested in generating parsers/generators in Verilog/SystemVerilog, which would be used as one endpoint with the existing C/C++ implementations as the other.

I've used Jinja for code generation in previous projects, and I feel like a plugin that would support generating code from Jinja templates would go a long way in making parsers in new languages more easy to create, especially where the demand isn't so great as to do it natively.

Jinja is primarily a Python tool, but I know that Jinjava exists, though I haven't used it.

Would be interested to hear if this sort of thing would be of use to anyone else

threewholefish avatar Jun 14 '22 16:06 threewholefish

BTW verilog allows specifying whole protocols. It would be nice if we could specify protocol state machines in verilog + parsers in KS and then transpile this into ioless protocols impls for other languages. BTW scapy has some dsl for protocols too, but I haven't dug into it.

KOLANICH avatar Jun 14 '22 17:06 KOLANICH