grmtools icon indicating copy to clipboard operation
grmtools copied to clipboard

User defined keys in grmtools section

Open ratmice opened this issue 2 months ago • 2 comments

Something I avoided putting much thought into during the 0.14.0 development cycle. Currently the %grmtools directive is very conservative in the sense that the CTBuilders and nimbleparse when they read a %grmtools section will complain if any of the keys in the section are unrecognized.

There are a number of keys that nimbleparse_lsp uses which don't have an equivalent in the %grmtools section. These are currently (and there is no problems keeping it this way) stored in an external nimbleparse.toml file.

The keys are extension which gives a file extension for the parser input. l_file an y_file, one could imagine the lexer having a parsers array, and the parser having a lexer string (where there may be multiple parsers for a lexer, and a single lexer for a parser)

I kind of see 3 options or ways that this could go:

  1. Do nothing (keep these in an external file like they currently are in nimbleparse.toml).
  2. Add some mechanism for user defined keys.
  3. Allow the ad-hoc set of keys to the allowed list of keys even if nimbleparse and the builders don't need them.

This isn't something I'm in any rush to decide, I'm currently trying to rewrite the lsp so that it is more portable to web assembly, and can be deployed on the web. For at least the first iteration of this development I plan on keeping it to an external .toml file.

Overall i'm somewhat inclined to explore and see if we can get something reasonable for option two. Because it is impossible to say what keys might be useful, like perhaps entry points for the clone_and_change_start_rule At the same time, we probably don't want user defined keys standing in the way of adding new keys to the grmtools section in the future.

One common way people do user defined keys (e.g. Cargo.toml/rustdoc) is to have a metadata key (the contents of which becomes a free for all). Another way, is to restrict the key namespace such that user defined keys can be recognized.

%grmtools {
  yacckind: Grmtools,
  @nimbleparse_lsp.extension: ".xyz",
  @nimbleparse_lsp.l_file: "xyz.l",
}
%grmtools {
   @nimbleparse_lsp.parsers: ["xyz.y"]
}

We could even consider the unadorned keys to be something like: @grmtools.yacckind, or @lrpar.yacckind, and with that we could continue to complain about unknown keys within the @grmtools or lrpar namespace, and ignore external keys. This is completely setting aside though question of how user defined keys are retrieved from the files.

To get these values out of the lex/yacc file we probably don't want to just copy/paste the cttests/grmtools_section.test parser to an external project have them read it directly, that could make extending the value types impossible. A small API where given a lex/yacc file and namespace like "@nimbleparse_lsp" which transforms the output to JSON or something could work. Perhaps along with a restriction that user defined namespaces can only contain numeric, string and array types.

I'd be curious if anything here strikes a chord, or causes an averse reaction, but as I said before I'm not really in a state where I can even start consuming these keys yet should we add them, there is a lot of work to be done before I could start to use them.

ratmice avatar Oct 26 '25 14:10 ratmice

Dumb idea: why not allow %XYZ directives e.g. %nimbleparse_lsp? We could pass a list of these to grmtools and tell it ignore ["XYZ", ...]; error on any not in that list or similar?

ltratt avatar Oct 27 '25 13:10 ltratt

Yeah, the notion of passing a list of things to be ignored also occurred to me (as entries in the grmtools section rather than %directives, but similar idea), that would work and it was what I was initially going to propose.

The reason I eventually ended up proposing @crate.field (which we could also do with directives e.g. %@nimbleparse_lsp.extension xyz) was that I was afraid that eventually we could run into clashing directives where some people use the same declaration names, and different value formats. Or similarly where we want to add a new directive to grmtools itself, but it clashes with some user defined one.

By picking a prefix like @ we can ensure we at least take some steps to avoid alphanumeric conflicts. E.g. we document that we only use reserve directives prefixed with %@grmtools or %@crate for crates we control and that all other fields are ignored and fare game, though we suggest people stick to the %@crate .

So that was kind of the thinking that lead to that thought, that @ could kind of behave as a built in ignore list.

In theory if a directive is both present in the ignore list, and a valid lex/yacc directive we could just not treat it as a lex/yacc directive. But we don't really have a single place where directives are handle anywhere so I'd be somewhat afraid that this would be difficult to do uniformly across all all directives.

I'm not sure it's a real concern, but I do like the future compatibility that the @ thing provides

ratmice avatar Oct 27 '25 14:10 ratmice