full-moon icon indicating copy to clipboard operation
full-moon copied to clipboard

Allow custom parsers to be created, separate roblox and 5.2 features into their own crates

Open Kampfkarren opened this issue 3 years ago • 4 comments

This is extremely tired rambling, so bear with me.

I think it's uncomfortable that full-moon must undergo a major version change nearly every time Luau (Roblox's Lua) changes something about its syntax, as it forces ALL consumers to step up (or we have to maintain two versions).

I think, if possible, we should be able to create extensions for full-moon (not dylibs) as their own crates, such as full-moon-luau, which would be able to provide parsers to full-moon so that only it would need to undergo a major version change.

Do this with 5.2 too, and support both (perhaps provide this with a feature flag, as Lua 5.2 won't change). Base full-moon would be 5.1.

Solves #47?

Kampfkarren avatar Mar 18 '21 14:03 Kampfkarren

Separating into different crates and allowing custom parsers does sound interesting, and will definitely help with the Luau split. I know this is pretty theoretical right now, but I was just wondering how this would work in practice though.

I can see how defining new parses should be relatively simple, but the main problem would be to update premade parsers to then extend the syntax - for example, adding a new statement requires extending the currently defined Stmt parser to add it in. I assume to solve this with the current way parsers are set up, the "sub-crate" would have to overwrite the Stmt parser (and others) completely - but this may lead to repeated code. The alternative would probably be to redo how the parsers are done completely I imagine, to somehow make them extensible. The same issue also arises here if a future version of a syntax removes a previous syntax grammar, or for something like Lua 5.2 vs LuaJIT - they both implement the same "goto", but in LuaJIT it's a LastStmt in contrast to a normal stmt in Lua 5.2. Would there be a way to "share" this syntax between the two?

Personally, I think this is definitely a good idea to take forward if it isn't too challenging to implement, but it seems like it would require quite a bit shakeup of how full-moon is currently set up. I would be interested to hear your thoughts about how you envision it

Also, I'm a bit curious about how this will solve #47? Would the downstream consumer use the extended syntax crate (e.g. luau), and then this sub crate itself has a sort of "on/off" the runtime toggle built into it, falling back to a "syntax only available in XXX" parser if "disabled"?

JohnnyMorganz avatar Mar 18 '21 23:03 JohnnyMorganz

This solves #47 because ideally, the API is something like this:

let parser = full_moon::new_parser()
    .with_extension(full_moon_luau_ext::LuauExtension::new())
    .build();

parser.parse(code);

This allows you to combine multiple extensions if you want, even.

I was thinking of making it so every enum has an Extended field, maybe with something like Any...? Blegh.

The same issue also arises here if a future version of a syntax removes a previous syntax grammar, or for something like Lua 5.2 vs LuaJIT - they both implement the same "goto", but in LuaJIT it's a LastStmt in contrast to a normal stmt in Lua 5.2. Would there be a way to "share" this syntax between the two?

Yeah. I'm not really sure the best way to handle that.

Kampfkarren avatar Mar 19 '21 01:03 Kampfkarren

In the same vein as the two gotos, but going a bit more radical -- say I want to create a transpiler that goes from a more C/JS-like (operator, indices) dialect of Lua to vanilla Lua, I would need to add some more tokens.

(It might not be that much change though -- it's just a bunch of extra BinOps plus a visitor to make it into a more normal-looking AST. On the other hand I guess the LastStmt thing can just be caught by a semantic check instead?)

Artoria2e5 avatar May 18 '21 11:05 Artoria2e5

Update on this.

I think structs will look like this:

struct LocalAssignment<P = ()> {
    local_token: Token, // Perhaps Token<P>?
    // etc...
    assignment: Expression<P>, // For sake of example...
    plugin_provided_info: P,
}

...and enums will look like:

enum Statement<P = ()> {
    LocalAssignment(LocalAssignment<P>),
    // etc
    PluginProvided(P),
}

P (by default (), to add no extra space and to ensure backwards-compatibility) probably won't have any real requirements of its own (and is completely dependent on what the authors of the extensions choose), but for the sake of plugins would be something like a serde-serializable HashMap. It's serde-serializable not for the sake of actual serde serialization, but for the sake of a consistent ABI that plugins can essentially force into standardization. Ultimately, it doesn't matter, as long as plugins are consistent, and as long as it can be ZST'd with () or !.

As for goto and LuaJIT being a LastStmt and being a Stmt in Lua 5.2...I'm not sure if this system helps any more than my original thoughts. I guess it'd just be whichever extension you register first/last? We can document in what order parsers are handled in so that it can be relied upon.

Kampfkarren avatar Jul 03 '21 07:07 Kampfkarren