kaitai_struct_compiler icon indicating copy to clipboard operation
kaitai_struct_compiler copied to clipboard

[WIP] Rust Compiler

Open bspeice opened this issue 5 years ago • 9 comments

Everything compiles, and the tests run (even though plenty fail).

There's plenty of implementation work left to be done, but this starting point solves (as far as I can tell) all the Rust-specific implementation issues.

A quick note on the design, though I'm planning to do a more in-depth write-up later. There are two lifetimes used to track references: the first is a "read" lifetime, used to communicate references that are valid only while reading, and a "stream" lifetime, used to communicate data that lives as long as the stream from which we're reading. This allows zero-copy parsing (stream lifetime), while still being able to traverse the entire _root/_parent hierarchy (read lifetime).

Ultimately, Rust is a bit different because (unlike C++) we never store a reference to _io, _root, or _parent; they must be provided everywhere they're used (read() and instance calculation).

@CWood1 - I'd be interested in your design input. The lifetimes are a bit thorny (because I need to prove that _io references outlive self, _root and _parent), and the TypedStack implementation I'm using is pretty complex, but I don't know if it's reasonably possible to do any better. Let me know if there's any info I can provide.

EDIT: Supersedes and deprecates #161.

bspeice avatar May 11 '19 03:05 bspeice

This looks good and relatively minimal to me — @bspeice, @CWood1 — are you ok if we'll start with merging this in?

GreyCat avatar May 11 '19 18:05 GreyCat

I'm finishing some clean-up on the tests; let's merge those at the same time, as it will make things much easier to develop in parallel.

bspeice avatar May 13 '19 14:05 bspeice

How does having separate lifetimes for "read" and "stream" work with instances? As I understand it, the lazy evaluation of instances means they are not evaluated at "read-time". Moreover, they can potentially be wide-ranging, accessing data outside the current stream (e.g., items from the parent or root streams).

Will this design hold up under those constraints?

(I apologize if I'm creating work for someone else, but I don't speak Rust and I haven't got the bandwidth to try to answer this question myself.)

webbnh avatar May 13 '19 15:05 webbnh

No worries, and that's a phenomenal question. Calls to instance calculation use 'read and 'stream the same as calls to read(); when calculating instances, it's required that you pass in _io, _root, and _parent. However, passing those parameters all the time doesn't make for a great API, so I'm planning on exposing the underlying storage (Option<inst_type>) as well.

The reason I went with this design is because I'm not sure it's possible to convince Rust that storing references to _root and _parent in the child can ever be safe. You could propose making additional 'root and 'parent lifetimes, but that would make everything far more complex, to the point that I'm not sure it can pass borrowck. Specifically, if a child has an immutable borrow of 'root for the entirety that _root is alive, _root is not allowed to borrow itself as mutable; that's problematic when we need to read data into a child from the stream, then later update _root from the same stream. There's probably some RefCell<> shenanigans you can get into (store _root and _parent as immutable borrows everywhere, and put all actual storage in an inner object), but I'd rather not go down that route unless absolutely necessary.

EDIT: It should also be noted that using Box<>/Rc<> is an option; I'm personally trying to avoid the allocator wherever it's not strictly necessary, but using heap types would make all the lifetime issues go away real quick.

That does leave open a question I have for @GreyCat though: What are the semantics for accessing _root._io? From the C++ I was reading, it appears to be that it is effectively the same as _io with some pushPos() popPos() logic attached, but if it's more complex than that we made need the RefCell<> design I mentioned earlier.

bspeice avatar May 13 '19 16:05 bspeice

Progress continues! We now generate struct bodies correctly, and (almost) all the tests are compiling (and failing). @GreyCat - as mentioned earlier, if you still want to merge, I'm OK with that as we now compile and should run in CI.

bspeice avatar May 17 '19 13:05 bspeice

OK, tests are compiling, and basic parsing code is generating. Tests are still failing, but at least we have things to test against. @CWood1, if you're still around, I could use help with how to plug this into JUnit, and after that, I think it's a matter of continuing to hack on this until it passes the test suite.

bspeice avatar Jun 04 '19 21:06 bspeice

Should be rebased, I think?

XVilka avatar Nov 11 '19 06:11 XVilka

@XVilka - You're right that it would need rebasing, but I don't have the time or desire to continue managing this. If the goal is handling binary parsing in Rust, libraries like nom already do a phenomenal job of this; my recommendation would be to use that instead.

I've been leaving this open on the off-chance that someone wants to take it up, but I don't plan on ever returning to this.

bspeice avatar Nov 13 '19 00:11 bspeice

Rust support PR

Agile86 avatar Mar 16 '23 01:03 Agile86