prosto
prosto copied to clipboard
Grammar and full-featured parser for Column-SQL
Currently Column-SQL is parsed using a simple function. Although it words, it has some limitations, for example, with respect to literals, enforcing more flexible syntactic rules etc. The goal of this item is to define a formal grammar for Column-SQL and implement a full-featured parser. Initially, it should work as an alternative to the existing (simple) parser but later on it should replace the existing parser. Currently, it is assumed that antlr will be used.
Define antlr grammar for Column-SQL with the corresponding unit tests where appropriate. The following syntax elements should be covered:
- Column names with spaces and other special characters using name delimiters like brackets: [My Column Name]
- Short names (prefixes) for operations like CALC instead of CALCULATE or FUNC instead of FUNCTION
- Maybe allow for alternative keywords like ARGS or MODEL
- Case insensitive keywords
- If possible arbitrary source code for functions and json for arguments. Simply do not parse whatever is after some keyword. One approach might be to introduce special tags to distinguish between function definition categories
- If possible and relevant, try to determine the role names, for example, if it is supposed to be a table or column (particularly, depending on the operation)
- Alternative: either parse column paths or treat them as complex names to be parsed by the topology translator if necessary (topology translator can resolve inherited columns and add new operations depending on the context)