grammars-v4
grammars-v4 copied to clipboard
Add grammars-v4 dir indexer
Hi. (I screwed up and push to master despite having a git branch command in my history. Weird.) Related to supporting lab.antlr.og and https://github.com/antlr/antlr4-lab/issues/11...
@teverett @kaby76 @KvanTTT please take a look at _scripts/mkindex.py, which generates a json list from a grammars-v4 path name:
$ python mkindex.py .. | jq
[
{
"name": "regex",
"lexer": "https://raw.githubusercontent.com/antlr/grammars-v4/master/xsd-regex/regexLexer.g4",
"parser": "https://raw.githubusercontent.com/antlr/grammars-v4/master/xsd-regex/regexParser.g4",
"start": "root",
"example": "example-chargroup-sub3.txt"
},
{
"name": "abb",
"lexer": "https://raw.githubusercontent.com/antlr/grammars-v4/master/abb/abbLexer.g4",
"parser": "https://raw.githubusercontent.com/antlr/grammars-v4/master/abb/abbParser.g4",
"start": "module",
"example": "robdata.sys"
},
{
"name": "DGS",
"lexer": "https://raw.githubusercontent.com/antlr/grammars-v4/master/graphstream-dgs/DGSLexer.g4",
"parser": "https://raw.githubusercontent.com/antlr/grammars-v4/master/graphstream-dgs/DGSParser.g4",
"start": "dgs",
"example": "removeAttribute.dgs"
},
{
"name": "SwiftFin",
"lexer": "https://raw.githubusercontent.com/antlr/grammars-v4/master/swift-fin/SwiftFinLexer.g4",
"parser": "https://raw.githubusercontent.com/antlr/grammars-v4/master/swift-fin/SwiftFinParser.g4",
"start": "messages",
"example": "test1.txt"
},
{
"name": "Lucene",
"lexer": "https://raw.githubusercontent.com/antlr/grammars-v4/master/lucene/LuceneLexer.g4",
"parser": "https://raw.githubusercontent.com/antlr/grammars-v4/master/lucene/LuceneParser.g4",
"start": "topLevelQuery",
"example": "boolean-3.txt"
},
{
"name": "Cql",
"lexer": "https://raw.githubusercontent.com/antlr/grammars-v4/master/cql3/CqlLexer.g4",
"parser": "https://raw.githubusercontent.com/antlr/grammars-v4/master/cql3/CqlParser.g4",
"start": "root",
"example": "createIndex.cql"
},
...
Hey @parrt great :)
Your intention is to use this to generate a JSON file for the lab tool? Could we also use it to generate a human readable index. Could that human-readable index be automatically included into the readme.md?
Your intention is to use this to generate a JSON file for the lab tool?
yep! @kaby76 has a bash script now.
Could trivially generate an index for the readme! I gen a list of dictionaries with info per grammar. Can gen markdown instead of json easily.
It would be amazing to generate an index into the readme.md. That way we could take on the refactoring @KOLANICH has suggested and still enable people to find their favourite grammar from the main page. It would actually be a significant improvement over what we have today since, for example, there are assembler grammars under /asm which people may not realize are there.
could a github action trigger a reindex too?
well i've never done that. Perhaps.
Yes, Github Actions could trigger a build of the index, but it would need to be somehow checked back into the tree without causing a whole new build/git ci ad infinitum. I'll try to play around with this.
Honestly, I don't think such description file is required at all. It's possible just to traverse directories and discover grammar files there. We have single pattern for grammar files and examples directory. Moreover, we have .pom
files that also help (it contains info about root rule, examples directory and so on).
@KvanTTT Well, I have an implementation that works off a generated grammars-v4.js file, which is equivalent to an index of the grammars-v4 repo in json format. If you can write an implementation of antlr4-lab that works over the wire, we can look at it and compare. Note, I was planning to make a separate implementation that gathers information over the internet from the poms, but I wanted to first follow through on this design.
I think we'll go with a generated index if we can. A casual user looking for a grammar is likely to make use of it, and perhaps less likely to click through directories searching.
@kaby76 full support for the GH Actions work you're doing.
Plus I need something I can simply download from the antlr lab to get the files.
I think I found the trick to do a check-in of the generated index file. See this PR. The current build doesn't kick it off because it's in a separate workflow (here). But, presumably once the workflow is added, it should work. I tested it on my own repo.
@kaby76 could it also generate markdown?
@teverett Yes, we can add a script to build some markdown.
The indexer isn't quite working yet. Something's wrong with the workflow, even though it works on a semi-duplicated "grammars-v4" repo over in my github.com account (https://github.com/kaby76/temp-with-actions). https://github.com/antlr/grammars-v4/pull/2881
"git diff --quiet" returns 0 even if there are untracked files. Git bites me again.
The last change seemed to fix the problem with indexing, and we have an updated "grammars.json" file from @parrt 's indexer. Let's see how this works in lab.antlr.org.
- The generated file from _scripts/mkindex.py contains entries where grammars are named by the declared name in the pom.xml (or maybe in the grammarDecl itself?). There are multiple grammars with the same names, e.g., the grammar at grammars-v4/javascript/javascript/ and the grammar at grammars-v4/javascript/jsx/ are both "JavaScript". How does one distinguish between the two except by looking at content, e.g., the Antlr4 grammars?
- Only about 60 grammars are listed in the generated index. I'm not sure why. There should be over 200.
- The index doesn't sort the grammar entries by "name".
I updated my fork of the Antlr lab to read the grammars.json file. Looks good. https://github.com/kaby76/antlr4-lab/tree/add-grammars-v4
You can see it in action on this droplet while it's up. http://134.209.209.215/
If we want to add information to the repo for programming language classification, reference links, or any other information, we're going to need to determine where to add it.
Add it to the readme.md, but make it more standardized
We could add the information for a grammar to the readme. Right now, there is no standardized format of what to document.
In the existing pom.xml, per grammar
The pom.xml could contain the information in the generation of the index, but it would have to be added carefully within the file. I tried to add a <classification>....</classification>
under /project
, and it was rejected. I then tried to place it under the first /project/build/plugins/plugin' and it too was rejected. I finally placed under
/project/build/plugins/plugin/configurationelement and it finally worked, presumably because the
antlr4-maven-pluginand
antlr4test-maven-plugin` don't check spurious elements.
Alternatively, we could create a new plugin that doesn't little except for a place to nest information on the grammar within the pom.xml.
In a separate xml file, per grammar
We could add a new file with structured information for indexing.
In a global xml file
We could add grammar information in one big file in the root of the repo.
- Only about 60 grammars are listed in the generated index. I'm not sure why. There should be over 200.
I noticed that as well but I had fairly strict constraints. If you take a look at the code you'll notice that it tosses out anything without an example I think or where there are more than two grammars. One I saw had a lexer, parser, and "hints" grammar. Strip those out because the ANTLR lab won't be able to handle those. Probably we need a flag on that indexer to generate some thing for the repository and something for the lab.
There are multiple grammars with the same names, e.g., the grammar at grammars-v4/javascript/javascript/ and the grammar at grammars-v4/javascript/jsx/ are both "JavaScript".
rats. Would it be unique if we included the directories containing the grammar like javascript/JavaScript? Or, perhaps the grammar file names are unique?
I updated my fork of the Antlr lab to read the grammars.json file.
Seems like we'd want the grammar.json file in this repo not the lab right? In other words the lab uses it but doesn't own it and doesn't have the code to generate it.
The pom.xml could contain the information in the generation of the index
Seems to make sense to keep the classification or location within the ontology at the definition of the grammar and then we have a tool that pulls that information to create an index. We could also have multiple classifications or tags to create different kinds of indexes like Assembly code versus Data language versus high-level language etc...