grammars-v4
grammars-v4 copied to clipboard
PLSQL/Golang:need a demo how to build a parser for plsql using Golang
I want to parser sql from oracle. and get some information, for example: db/table/column/index and so on. then I refer to some introduction from web like following, as expected, some output-info should print, but it doesn't work can someone support a demo for me?
type TreeShapeListener struct {
parser.BasePlSqlParserListener2
}
func NewTreeShapeListener() *TreeShapeListener {
return new(TreeShapeListener)
}
// EnterDefault_selectivity_clause is called when production default_selectivity_clause is entered.
func (s *TreeShapeListener) EnterDefault_selectivity_clause(ctx *parser.Default_selectivity_clauseContext) {
fmt.Println("Enter EnterDefault_selectivity_clause")
}
func (tsl *TreeShapeListener) EnterSelect_only_statement(ctx *parser.Select_statementContext) {
fmt.Println("Enter EnterSelect_only_statement")
}
func main() {
sql := "select colA from tblname where id in (1,2,3);"
input := antlr.NewInputStream(strings.ToUpper(sql))
lexer := parser.NewPlSqlLexer(input)
stream := antlr.NewCommonTokenStream(lexer, 0)
p := parser.NewPlSqlParser(stream)
p.AddErrorListener(antlr.NewDiagnosticErrorListener(true))
p.AddParseListener(NewTreeShapeListener())
p.BuildParseTrees = true
parent := antlr.NewBaseParserRuleContext(nil, -1)
p.SetParserRuleContext(parent)
p.Consume()
return
}
as expected, the function EnterSelect_only_statement & EnterFrom_clause & EnterWhere_clause should be called. but nothing happend.
The Go/ target files is more aspirational than functional. The problem is that the Go target generates actions that pass "l" and "p" pointers, not "self". The use of "self" in the grammar action rules cannot work. There was some discussion where I tried to convince folks who wrote the Go target using "l" and "p" are really really bad names, because it makes "target agnostic" grammars impossible.
The first thing to fix is to make a functioning set of grammar files. You'll need to change the use of "self" in the grammars. In the lexer grammar, change "self" to "l". In the parser grammar change "self" to "p". Then, you will need to write all the base class code, which currently doesn't even compile. Hence, all this discussion for a "preprocessor" for Antlr. (There are other reasons for that as well. I've been toying with code to re-introduce the tree construction operators into Antlr4 by generating an Antlr4 grammar with supplemental actions and declarations.)
Note, the only functioning Go target grammar that has a base class is the golang/ grammar. There you see the Go/ directory with the Go-target-specific grammar.
It looks like 4.9.3 has changed the names of the generated parameter to actions. It's now all "p". So something may have clicked in the Go target authors.
So, change "self." to "p." in all the grammar actions.
I have a functioning sql/plsql parser for Go. Lots to fix in the repo, but this seems to work on the first fiew tests, and reasonably fast. Generated.zip
It's now all "p".
I'll update the translator.
@studentmain I think we might want to change all this "self." back to "this." in the sql/plsql grammar so that your translator can no-op that for Java and C# targets, but fix it up for Go. Or maybe. should the conversions be parameterized via a JSON file spec, so it notes in this spec that "this." is converted to "p."? Hardwiring the translation in the Translator or passing them via args doesn't seem right. But, I'm not sure. BTW, I'm toying around with Antlr4 StringTemplates for C# to include DynamicXml field/attribute referencing, so that we can write just templates against a parse tree input. Sort of like XSLT.
of course that's ok, and maybe best solution. I am trying the code supported by you Generated.zip, the output is as expected?
> go run Test.go -input 'select * from tbl_name' -tree -tokens
input: select * from tbl_name token: true
0 select
1
2 *
3
4 from
5
6 tbl_name
7 <EOF>
start doTime: 0.001 s
Parse succeeded.
start show tree
(sql_script <EOF>)
[snip] the output is as expected?
Yes, the parser is working. But, the parse tree is not looking great because the Go target doesn't have everything implemented, like reset() which is used with -tokens. So, instead, try "go run Test.go -input 'select * from tbl_name' -tree". I don't know whether they implemented reset() yet. And, I don't know what they implemented to print out tokens either. I need to read the 4.9.3 runtime code and update the templates accodingly. It's been very slow going receiving new updates to the Antlr4 runtime.
So, I think you're on your way.
By the way, the Go target isn't quite implemented correctly. func setVersion12() should be func SetVersion12() because Go does not export funcs or fields named with the first character in lowercase. My mistake.
Further, the Go target declared nested structs for the parser and lexer. This is quite unfortunately because one cannot do the equivalent of a constructor that is done in the other targets. As a result, there isn't a good way to set _isVersion12 to true except in the main program Test.go after the NewPlSlqParser() call. The templates for Go should be changed so that the base class is a pointer to a struct, so a constructor can be called as with the other targets. I just don't think the Go target has been tested that much, and the grammars-v4 pile contains a lot of diverse grammars.
To get my code to work, the "set" func's in plsql_base_parser.go should be renamed to start with an uppercase letter. Further, in Test.go, the line "parser.SetVersion12(true)" should be added.
I've made two PRs to the Antlr Go runtime to fix the unexported "reset()" func in a lexer, and the unexported "String()" func in token.go. I will probably suggest a PR to fix the "nested struct" problem.
I will be upgrading trgen in the next day or so. Afterward, I can then check in the code for grammar sql/plsql/ and csharp/, which was a real headache. I had to change the code generation for trgen quite a bit for Go.
The problem is that with the Go target, we are stuck between a rock and a hard place. The Go language has many apparent restrictions (I am not a GO expert and there may be workarounds, but I don't know) and the Antlr tool doesn't generate things to make it all work easily.
- Antlr defines globals for parserATN, literalNames, symbolicNames, ruleNames in a parser. If you try to have two parsers within one directory, as is done with csharp/, Go complains that there's a duplicate definition.
- If you try to put them in one directory but use different package names on the Antlr tool command (-package CSharp and -package CSharpPreprocessor), Go complains that you have two packages in one directory.
- If you try to define Go source files with a package of one name in a directory with a different name, Go complains.
- Therefore, Parser generated files must be placed in separate directories and the generated and base class files have a package name equal to the directory name. I will recommend changing the Antlr Go templates so that the names do not collide (append the grammar name), and we just offer getters on the parser object to fetch the data using a consistent name. Fixing this will IMMENSELY simplify things. This can be added on top of the three other issues I raised for the Go target.
- To make my life easier, I assumed that the Go/ directory contains the grammar and base class files under the name of the grammar, e.g., for sql/plsql, the base classes are in sql/plsql/Go/PlSql/parser_base.go. I now call the Antlr tool with the -o, -lib, and -package (all of them) to generate the right files exactly in the right directory with the right package name.
- Go does not allow
import { "../CSharp" }
to grab a shared lexer. They either have to be placed in another package somewhere else, or copies of all the files made. For csharp/, I make copies of files. A terrible solution, but I do not know what to do with this mess. I do not understand Go. I wish it was more like an OO language C# or like assembly language plus, like C. - It took me two days to stumble on the right syntax to do typecasting in Go because there are no generics in Go. Typecasting is needed for grammar predicates and actions.
Once you get past all this, the compiled Go target parser is reasonably quick.