antlr4 Proposal: Add action templates functionality

This is an idea about how to resolve #4067 without developing an entirely new actions language.

This PR enables users to write cross-target grammars which make use of actions, by enabling users to provide action templates as StringTemplate .stg group files on the command line.

By providing different action templates for each target language, users can provide a different implementation of the action logic for each target.

Java.stg:

normalizerImports() ::= <<
import java.text.Normalizer;
import java.text.Normalizer.Form;
>>
normalize(s) ::= <<Normalizer.normalize(<s>, Form.NFKC)>>
getText() ::= <<getText()>>
setText(s) ::= <<setText(<s>);>>

Javascript.stg:

normalizerImports() ::= ""
normalize(s) ::= <<<s>.normalize("NFKC")>>
getText() ::= "this.text"
setText(s) ::= "this.text = <s>"

The example below is the motivating example for me - I have a grammar that I'd like to use in both a JVM-based compiler and a VS Code extension:

Example.g4:

lexer grammar Example;

@lexer::header {
<normalizerImports()>
}

ID : (ID_START ID_CONTINUE* | '_' ID_CONTINUE+) { <setText(normalize(getText()))> } ;
ID_START : [\p{XID_Start}] ;
ID_CONTINUE: [\p{XID_Continue}] ;
WS : (' '|'\n') -> skip ;

Thanks to my employer @opencastsoftware for sponsoring this work!

Jul 03 '23 16:07 DavidGregory084

Mmmm... this IS an "entirely new actions language". The fact that it leverages templates makes it look like not (and is elegant), but we'd still have to opine on the syntax and more importantly: the supported set of 'keywords', both characteristics of a language... The latter could be a blocker because we precisely don't want to find ourselves in a situation where we'd have to expand that set for every language, because the existing set does not support the specific use case a user needs to support... Using templates for testing works because the set is limited to what the engine requires, and the task of supporting additional templates lies with the developer needing it, but with your proposal that task would lie with target authors who don't actually need it, so not sure it would work well. As you know, there are other options being discussed.

Jul 03 '23 17:07 ericvergnaud

Mmmm... this IS an "entirely new actions language". The fact that it leverages templates makes it look like not (and is elegant), but we'd still have to opine on the syntax and more importantly: the supported set of 'keywords', both characteristics of a language...

Hmm can you explain what you mean by that? The syntax is that of StringTemplate, and all of the template identifiers used are user-defined - they’re simply supplied as StringTemplate groups.

Jul 03 '23 17:07 DavidGregory084

Ah I think I understand - I’m not proposing that there is a centrally managed collection of StringTemplate groups shipped with ANTLR - this PR adds a command line option via which users can supply their own templates which they have written themselves.

Writing cross-target grammars is a rather advanced use case and I think it’s reasonable to expect users to learn StringTemplate syntax if they want to do that.

Jul 03 '23 17:07 DavidGregory084

Why StringTemplate?

It's used in the implementation of ANTLR itself and in the ANTLR code generation process, so it's familiar to ANTLR contributors
It's already on the classpath of the ANTLR tool, so there are no new dependencies needed
It's a stable and well-tested template language that will allow users to provide whatever kind of action logic they want

Jul 04 '23 12:07 DavidGregory084

I have a few observations and questions.

I've been writing target-agnostic Antlr grammars probably more than most, porting grammars-v4 to different targets, pretty much every day of the week for several years, and just got through yet another, python/python3. The main difficulty is that the runtime APIs across targets are inconsistent. Three bugs were discovered (https://github.com/antlr/antlr4/issues/4318 https://github.com/antlr/antlr4/pull/4340 https://github.com/antlr/antlr4/pull/4343). While annoying, dealing with actions per target is manageable using target agnostic format.
Importantly, why is this part of a parser generator? And, as with any parser generator, where is the boundary between parser generator and some other tool that uses it? For example, I am firmly against embedding a sub-par XPath engine (pre-version 1) in the Antlr runtime. It exists only for some targets. That is why I ported a XPath version 2 engine to C#, and added a DOM interface over the Antlr parse tree in a separate tool. Antlr trees change very rarely, which means that it can be released on its own schedule, not tied at all to a minor version number change in Antlr.
Note, I've been writing, already, a string processor for Antlr grammars, trgen, so I am familiar with ST--and all it's problems--and the idea of separating ST processing from the parser generator. (I've been thinking of a more general processor as well, and have always been struck by the fact that ST does not contain a general-purpose, command-line processor.)
I am not at all happy with the way the model-view is separated in ST. For example, you cannot write a simple string equality test: <if(target=="CSharp")>....<endif>.

Jul 04 '23 13:07 kaby76

While annoying, dealing with actions per target is manageable using target agnostic format.

I think the Fortran90 grammar is actually a good example in itself of the need for some kind of solution to target-agnostic actions.

The target-agnostic format described there relies upon superClass, language keywords like this and property dereference via . in order to work, which means that it falls short for non-OO targets like Go and needs adjustments in order to produce valid code. It’s great that such a pattern exists and is well documented, but I think a more integrated solution would be preferable.

In many cases it would be really difficult for users to integrate such a script into their workflow (e.g. when the ANTLR generator is called by a build tool plugin or as part of some other complex workflow).

Importantly, why is this part of a parser generator? And, as with any parser generator, where is the boundary between parser generator and some other tool that uses it?

ST is already a dependency of the ANTLR tool, so I guess to some extent the question of whether to embed it into ANTLR has already been decided. The expansion of ST templates in this PR is completely optional and is only active if the user provides the -DactionTemplates command line flag.

I suppose the boundary here is the point of template expansion, which happens early in the ANTLR generation process just after parsing the grammar file itself. It's also where the most troubling interactions occur, in terms of showing users error messages relating to the template expansion process.

I am not at all happy with the way the model-view is separated in ST.

I think that is a moot point in this implementation as no model binding is performed - the user has to write all of the groups that they want to use in their .stg template, and they can use only those groups in their grammar. 😄

Jul 04 '23 14:07 DavidGregory084

Usually, I would say that ST should be supported outside the tool, but after thinking about this a bit, I think you are right. The main reason why it should be supported in the tool is because people would say a grammar that requires further processing shouldn't be called an Antlr grammar. In addition, the grammar could be tagged as requiring a certain version of Antlr. I will try out some examples shortly.

Jul 04 '23 18:07 kaby76

Semantic predicates are actions, too. It's not expanding template references there, but it is working elsewhere.

header() ::= <<
int count;
>>

action1() ::= <<
this.count = 0;
>>

pred() ::= <<
this.count \<= 0
>>

action2() ::= <<
this.count++;
>>

grammar Expr;

options { actionTemplates='Temp.stg'; }

@members {
<header()>
}

file : x (SEMI x)* SEMI? EOF;
x : { <action1()> } expression;
expression : expression POW expression
 | expression (TIMES | DIV) expression
 | expression (PLUS | MINUS) expression
 | LPAREN expression RPAREN
 | (PLUS | MINUS)* atom ;
atom : scientific | { <pred1()> }? variable { <action2()> } ;
scientific : SCIENTIFIC_NUMBER ;
variable : VARIABLE ;

VARIABLE : VALID_ID_START VALID_ID_CHAR* ;
SCIENTIFIC_NUMBER : NUMBER (E SIGN? UNSIGNED_INTEGER)? ;
LPAREN : '(' ;
RPAREN : ')' ;
PLUS : '+' ;
MINUS : '-' ;
TIMES : '*' ;
DIV : '/' ;
GT : '>' ;
LT : '<' ;
EQ : '=' ;
POINT : '.' ;
POW : '^' ;
SEMI : ';' ;
WS : [ \r\n\t] + -> channel(HIDDEN) ;

fragment VALID_ID_START : ('a' .. 'z') | ('A' .. 'Z') | '_' ;
fragment VALID_ID_CHAR : VALID_ID_START | ('0' .. '9') ;
fragment NUMBER : ('0' .. '9') + ('.' ('0' .. '9') +)? ;
fragment UNSIGNED_INTEGER : ('0' .. '9')+ ;
fragment E : 'E' | 'e' ;
fragment SIGN : ('+' | '-') ;

Jul 04 '23 19:07 kaby76

Semantic predicates are actions, too. It's not expanding template references there, but it is working elsewhere.

Ah thanks for trying it out - I will take a look at semantic predicates tomorrow

Jul 04 '23 19:07 DavidGregory084

Looks like you need to test for SEMPRED in addition to ACTION in expandActionTemplates(GrammarRootAST root). Then, trim the extra '?' from the text of the predicate. Honestly, I don't understand why the text isn't just trimmed of '{', '}', and '?' when the parse tree is constructed. You might need to check the equivalent action blocks for a lexer grammar. I don't know if the parse tree is very different.

Otherwise, I think solution is better than the others we've thought of up to now. It fixes the "p" vs "l" problem, as well as the "this." vs "$this->" vs "self." vs "this->" problem. And it doesn't result in a proliferation of if-then-else code sprinkled through the grammar itself.

I don't think we have any more non-OO targets. The Go runtime is slammed into something that resembles an OO framework. It just wan't implemented well to begin with. But, ST actions could come in handy with a new target like C.

This doesn't take care of the target-specific code in local and return clauses. But, we don't use those features in grammars-v4.

You got my vote. @parrt Please consider this PR. It goes a long way to fixing the problems in writing grammars over different targets.

Jul 04 '23 22:07 kaby76

A similar but more independent approach would be to write the grammar as a template and apply the transforms in order to produce a target specific g4. Less convenient but does not require any change to antlr4 tool.

Jul 05 '23 07:07 ericvergnaud

The problem with writing the grammar as STG-format file is that it is then not an Antlr grammar. The format could not be parsed using the grammars-v4 antlr4 grammar. Existing VSCode or VSIDE extensions would not work. People would not call it an Antlr grammar. The use of the template calls would not be restricted to just the actions. Not a good solution.

You could implement this as a separate command-line tool called "antlr4-plus", processing the .g4 as a ST formatted file where ST attributes are referenced only in the actions, the proposed format for the grammar. This tool could parse, then render a normal Antlr4 grammar. VSCode and VSIDE extensions would still work. But, people would not call the grammar an "Antlr4 grammar", but an "Antlr4-plus" grammar. It would be labeled a third-party tool, and any grammars in the format would probably get a new extension, like ".g4plus".

The template group file reference is directly listed as an option in the grammar. It's an explicitly listed dependency, which is a much better thing than hiding it. If the grammar option contains options { actionTemplates='Temp.stg'; }, and folks accidentally run the Antlr4 tool on it, the Antlr tool would give a warning because the tool warns on all options it does not know. If the build uses -Werror, the build breaks which is correct, but, people would say "What is this? Is this an Antlr4 grammar?". They're then going to have to redo all their build tooling, new rules in the NPM .json, pom.xml, .csproj, etc. teverett and KvanTTT won't like adding .g4plus files to grammars-v4 because they're not exactly Antlr4 grammars.

Folks are very hesitant to use third-party tools. They won't know how to install, screw up on install. Antlr4-tools is official, and directly referenced in antlr.org, nothing further to do. Nobody trusts my Trash toolkit--years in development, used in grammars-v4, extremely useful. But, it's not listed in antlr.org, won't ever get there and only modified by me. Several other tools that are Antlr based, and they're also not listed there, like XText (although that is Antlr3-based). I'm sure there are others.

Jul 05 '23 09:07 kaby76

@kaby76 re the template not being a valid grammar, that's a fair point. That said I think actionTemplates should be a command line option only. Its purpose is to provide a different implementation per target. Making it a grammar option ( using options { actionTemplates='Temp.stg'; } ) pushes the responsibility of locating that template to antlr, which is wrong. @DavidGregory084 there is a big downside to this which relates to automation. Many developers use the maven plugin to generate code at build time (rather than at dev time - which I prefer). The plugin would now need to know where to find the action templates, on a per target basis. Does your PR cater for this ?

Jul 05 '23 10:07 ericvergnaud

Making it a grammar option ( using options { actionTemplates='Temp.stg'; } ) pushes the responsibility of locating that template to antlr, which is wrong.

Well, I could solve the explicit dependency by applying a naming standard to the .g4's and .stg's, e.g., Python3Lexer.g4/Python3Lexer.stg and Python3Parser.g4/Python3Parser.stg. I already apply a coding standard for the location of target-specific files over in grammars-v4. It took years to straighten it all out, but all target-specific files are in sub-directories Cpp/, CSharp/, Dart/, Go/, Java/, JavaScript/, PHP/, Python3/, and TypeScript/. So, these .stg's would go into each of the sub-directories. That's fine because that's where the superClass files already exist. The base classes can be easily rewritten as .stg's. I'd know which grammar requires a command-line option -actionTemplates=foobar for trgen, and could then generate a build script in whatever build system (dotnet, maven, npm, etc) to apply the command-line option.

But, why is an equivalent actionTemplates "wrong"? The Antlr tool already has a superClass option in the grammar. Why even use options { superClass=Python3LexerBase; } when we can just do antlr4 -DsuperClass=Python3LexerBase Python3Lexer.g4 and assume so because we have a naming scheme?

Jul 05 '23 11:07 kaby76

there is a big downside to this which relates to automation. Many developers use the maven plugin to generate code at build time (rather than at dev time - which I prefer). The plugin would now need to know where to find the action templates, on a per target basis. Does your PR cater for this ?

All the more that it should be explicitly listed in the grammar file, and not pushed to the build tool. You won't have to change the build tool, except that the grammar now requires a new version of the Antlr tool which people do anyways.

Jul 05 '23 12:07 kaby76

@kaby76 Sorry to disagree, but that would have to be a new entry in the pom file. I am not supportive of an approach that requires the tool to resolve the stg file path in the first place, and differently for each target.

Jul 05 '23 12:07 ericvergnaud

@kaby76 Sorry to disagree, but that would have to be a new entry in the pom file. I am not supportive of an approach that requires the tool to resolve the stg file path in the first place, and differently for each target.

First, actionTemplates is not a directory, it's a top-level file name, including path. The path can encode the target, but it's best not to do that. And, I would not have the -Dlanguage option (or hardwired option in the grammar file) affect where to find the templates. That would not be referentially transparent.

At least in grammars-v4, nothing in the .g4 would need to be changed per target. Just place the .stg's in the same directory as the .g4's. The command line to generate the target-specific recognizers would still be the same: antlr4 -actionTemplates=Python3Lexer.stg -Dlanguage=CSharp Python3Lexer.g4; antlr4 -actionTemplates=Python3Lexer.stg -Dlanguage=CSharp Python3Lexer.g4, or antlr4 -actionTemplates=Python3Lexer.stg -Dlanguage=Java Python3Lexer.g4; antlr4 -actionTemplates=Python3Lexer.stg -Dlanguage=Java Python3Lexer.g4.

The target-specific nature of the templates is laid out in the coding standard--the directory structure--after years of arguing about it in grammars-v4. But, as far as I know, the Maven plugin does not work for other targets besides Java. But, for trgen, the target specific files are overlayed over the files where the .g4's reside.

If you really want to place the .g4's and .stg's in separate directories, the Antlr tool already has a way to do that with -lib on the command-line. I think the PR should be updated to reference that path. The target-specific nature would then tied with an explicit target mentioned in the -Dlanguage and the -lib option together.

Nothing in the pom.xml would need to change whatsoever other than now adding the -actionTemplates option--which I wouldn't do, because I would always write the grammar with the option in the grammar itself, just like the superClass option. You could then group the .stg files with the .g4's all in one directory. Trgen would still work because the target-specific files are copied to where the .g4 files are, overwriting the Java-specific files.

Jul 05 '23 12:07 kaby76

That said I think actionTemplates should be a command line option only. Its purpose is to provide a different implementation per target. Making it a grammar option ( using options { actionTemplates='Temp.stg'; } ) pushes the responsibility of locating that template to antlr, which is wrong.

I agree that a target-specific option does not make a lot of sense in the grammar file - the only reason that I have done it this way is that the "language" option is already a grammar-level option and "actionTemplates" seems to belong wherever "language" is defined - happy to change it.

I expect most folks would provide "actionTemplates" as a command line option.

EDIT: I see that @kaby76 has a slightly different pattern in mind but both usage patterns would work with a grammar-level option I think?

@DavidGregory084 there is a big downside to this which relates to automation. Many developers use the maven plugin to generate code at build time (rather than at dev time - which I prefer). The plugin would now need to know where to find the action templates, on a per target basis. Does your PR cater for this ?

Not at the moment - I think that would require changes to the Maven plugin and I haven't looked at that part of the project yet.

The Gradle plugin already requires manually manipulating the command line options for any advanced use-cases, e.g. here's the invocation from my current project:

tasks.generateGrammarSource {
    // See: https://github.com/antlr/antlr4/issues/2335
    val outputDir = file("build/generated-src/antlr/main/org/mina_lang/parser")
    // the directory must exist or ANTLR bails
    doFirst { outputDir.mkdirs() }
    arguments = arguments + listOf(
        "-visitor",
        "-no-listener",
        // the lexer tokens file can't be found by the parser without this
        "-lib", outputDir.absolutePath,
        "-package", "org.mina_lang.parser")
}

Having said that, I don't really know which use cases would require providing different arguments to the same build tool?

e.g. in my case, I will probably move my grammar to a top-level directory, and symlink it into my Gradle source directories and my npm source directories. Then each build tool will invoke the ANTLR tool with different command line arguments to generate the target that's relevant for that build tool - Java for my Gradle build, TypeScript for my npm build.

Jul 05 '23 13:07 DavidGregory084

In https://github.com/antlr/antlr4/pull/4345/commits/f89c5205b38f054978b42263a71e3b0fa21afdcc I've tried to connect up the error reporting from StringTemplate with the positions in the ANTLR grammar file. You can see the results in tool-testsuite/test/org/antlr/v4/test/tool/TestActionTemplates.java. I'm sure that there are cases that I haven't thought of which could result in janky position reporting - I think it will take a few folks kicking the tires to iron out the problems with it!

Jul 05 '23 13:07 DavidGregory084

That said I think actionTemplates should be a command line option only. Its purpose is to provide a different implementation per target. Making it a grammar option ( using options { actionTemplates='Temp.stg'; } ) pushes the responsibility of locating that template to antlr, which is wrong.

I agree that a target-specific option does not make a lot of sense in the grammar file - the only reason that I have done it this way is that the "language" option is already a grammar-level option and "actionTemplates" seems to belong wherever "language" is defined - happy to change it.

I expect most folks would provide "actionTemplates" as a command line option.

Yes, respectfully I disagree. The .stg's are just like imported .g4's. Collectively, it's the "grammar". Without either, it's not a whole grammar. The target-specific nature should in the -lib/-Dlanguage options, not the path in the actionTemplates option.

Following this logic, why even have options { superClass=Python3LexerBase; } in the grammar? Why not always say antlr4 -DsuperClass=Python3LexerBase. It does not make sense; it is not consistent.

Jul 05 '23 13:07 kaby76

Yes, respectfully I disagree. The .stg's are just like imported .g4's. Collectively, it's the "grammar". Without either, it's not a whole grammar. The target-specific nature should in the -lib/-Dlanguage options, not the path in the actionTemplates option.

I can definitely see the viewpoint that it is like an imported file.

To try and explain the other viewpoint a bit more - the "actionTemplates" argument is the mechanism which allows the grammar to receive different target-specific template snippets dependent upon the given -Dlanguage option, so from that perspective, it makes sense that it would be a target-specific option.

By making the "actionTemplates" a part of the grammar file, users are forced to use relative paths in order to provide those target-specific templates differently for each target, which enforces a specific approach and folder structure onto users of the option.

I hope that explanation makes sense?

Jul 05 '23 13:07 DavidGregory084

@kaby76 a couple of thoughts:

superClass is NOT target dependent. I can have the same superClass name regardless of the target. I personally do that.
the stg IS target dependent, and being target specific is its only purpose - if I just need 1 target then I can write action code directly in the grammar. So where would the tool find that target-specific file ?
If the full path is hardcoded as an option in the grammar, then it's not target specific.
If it's a file name relative to a directory structure, then it forces every developer to replicate a specific directory structure (is that the one that we've picked for the antlr4-grammars), and places the responsibility of locating the actionTemplates file on the tool, 2 things which we absolutely want to avoid.

Jul 05 '23 13:07 ericvergnaud

Semantic predicates are actions, too. It's not expanding template references there, but it is working elsewhere.

@kaby76 I added a test using a cut-down version of your example to ensure that semantic predicates are being expanded in ~https://github.com/antlr/antlr4/pull/4345/commits/b619c20e5381784d8461f9c2d0a2d4845b674fcb~ https://github.com/antlr/antlr4/pull/4345/commits/ef95015c3627462b08ccf360a21d9ea91168f68e (whoops forgot to sign off commit)

Jul 05 '23 14:07 DavidGregory084

We'd also need to support this in the IntelliJ plugin

Jul 05 '23 15:07 ericvergnaud

if I just need 1 target then I can write action code directly in the grammar. So where would the tool find that target-specific file ?

I have my preferences, but you're going to have to pick your poison, i.e., choose a coding standard. But, yes, you absolutely don't want to force a coding standard in the Antlr tool itself, i.e., "the .stg files must be named Java.stg, CSharp.stg, etc"--because there may be different top-level stg files for each grammar. Or, "the .stg files must be placed under a directory that is named after the target, and further named after the name of the grammar".

In the grammars-v4 repo, the target-specific files are placed in a sub-directory with a target name. That's been agreed upon after years of bantering about. I really don't want to revisit this because it's been hell just to get to this standard, common coding convention across all 300 grammars. Trgen will read the grammar and copy all files from the target-specific directory into the directory with the grammars. You can even fork a .g4, and place it in the target-specific directory. I needed something so I can just write trgen and do testing in a nice, consistent, template-driven manner.

And, yes, I would also never hardwire the target into the grammar, either in the language option, or via actionTemplates option, or the actions themselves.

But, I would hardwire the action template file name into the grammar, and never use a relative or absolute path. I would like that the feature honor -lib because I think that would be useful for other developers. But, whether we have a -lib option for searching for .stg's, it doesn't matter: trgen will just copy the files CSharp/* or Java/* or what have you to where the .g4's live. The advantage of hardwiring options { actionTemplates=... } into the grammar forces the tool to raise an error if the file is missing. Missing files should be flagged as an error at tool time. I swear, someone is going to type antlr4 -Dlanguage=CSharp Python3Lexer.g4, and expect a functioning lexer out the back. Then, I'll start seeing posts from folks wondering "The tool generated a parser, but it's not compiling. Why is this? Why isn't Antlr generating the parser for CSharp? What are these <foobar> thingies springled in the grammar? I used the -Dlanguage=CSharp option, just do it!". We're going to get these questions at least a couple times a month.

BTW, if action templating is outside the Antlr tool, someone is going to take a grammar from grammars-v4, insist on not installing the third-party tool, and start manually editing the .g4, moving the template code into the .g4! We'll never stop hearing the complaints that "This is too much work! This is not an Antlr grammar!". At least right now folks can take a "target agnostic grammar" and manually adjusting the .g4, changing "this." to "this->" or what have you, because they refuse to run python translateGrammar.py.

Again, what ever is done, the fewer assumed coding standards within the tool, the better in my book. I cannot tell you how many times I cannot stand the forced convention of capitalization in Golang for exporting, or the "One True Brace Style" (1TBS) in Golang (give me ';' any day!), or .... even the uppercase letter in an Antlr grammar can only be a lexer symbol, and a lowercase letter a parser symbol!!

Jul 05 '23 16:07 kaby76

@kaby76

A side-effect of this proposal, if accepted, will be to remove the need for a per language folder in grammars-v4, except when a base recogniser or parser is genuinely required (which I believe is rare, but you know better).

If for example I look at the folder for java, there would no longer be a need for Java9LexerBase, and I suspect the structure could become as simple as: Java9Lexer.g4 Java9Parser.g4 java-action-templates.stg csharp-action-templates.stg ...

So relying, within the tool, on a hard coded structure that is bound to disappear doesn't seem like the best move...

Plus, it's much less likely to be approved. The smaller the footprint, the higher the chances.

Jul 05 '23 17:07 ericvergnaud

@kaby76

A side-effect of this proposal, if accepted, will be to remove the need for a per language folder in grammars-v4, except when a base recogniser or parser is genuinely required (which I believe is rare, but you know better).

If for example I look at the folder for java, there would no longer be a need for Java9LexerBase, and I suspect the structure could become as simple as: Java9Lexer.g4 Java9Parser.g4 java-action-templates.stg csharp-action-templates.stg ...

So relying, within the tool, on a hard coded structure that is bound to disappear doesn't seem like the best move...

Plus, it's much less likely to be approved. The smaller the footprint, the higher the chances.

We could have the files either way. trgen, nor the Antlr tool, would care. I'll have to chit-chat it with the admins over in grammars-v4.

Option 1

java/java9/Java9Parser.g4
java/java9/Java9Lexer.g4
java/java9/Java/action-templates.stg
java/java9/CSharp/action-templates.stg
...

Option 2

java/java9/Java9Parser.g4
java/java9/Java9Lexer.g4
java/java9/Java-action-templates.stg
java/java9/CSharp-action-templates.stg
...

I agree, make this PR as small as possible to give @parrt less material to complain about. :)

trgen won't care if all the .stg's are in the same directory as the .g4's, or in the target-specific directory. All files are all copied into the directory containing the grammars from the target-specific directory.

The templates for the trgen driver code would be straightforward in either case. For example, with "Option 1" and the Java target, the antlr4 call would require -DactionTemplates='<x.GrammarName>.stg'. For "Option 2", I'd need to add -DactionTemplates='<target>-<x.GrammarName>.stg'. I don't have an attribute <target>--I never had the need. But, easy enough to add.

I really hope this is merged. It cleans up so much junky hacks.

Jul 05 '23 19:07 kaby76

Ok so I guess the next step is to get @parrt's in principle approval. Then we can look at side effects: maven plug-in, IntelliJ plug-in, antlr lab ... any other ? (Nobody's maintaining the eclipse plug-in I'm afraid...) @parrt any thoughts ? It seems that this elegant proposal would address a very longly and highly desired cross-target capability with very little impact.

Jul 05 '23 20:07 ericvergnaud

@parrt the general idea follows what we implemented for testing i.e. string templates. The templates would be provided by the developer via a cmd line option, meaning we only provide the automation, not the templates themselves.

Jul 05 '23 20:07 ericvergnaud

Trying this PR out on the python/python3 grammar. So far I have Cpp, CSharp, and Java working fine. It looks good. The "transformGrammar.py" hack is gone and there are fewer files for Cpp (no headers that I need to declare). I can probably combine the lexer and parser stg's together for a target. This would result in half the files that define the grammar.

stg-python3.zip

Jul 06 '23 03:07 kaby76

antlr4 antlr4 copied to clipboard

Proposal: Add action templates functionality

Option 1

Option 2

antlr4
antlr4 copied to clipboard