antlr4
antlr4 copied to clipboard
Is "!node" legal XPath?
I've ported a very large XPath engine from Java to C#, and have been updating it to XPath 3.1, and integrating Antlr parse tree support as DOM with the addition of attributes for some properties of a node. It works great. However, I was going through some of the expressions that the XPath engine from the Antlr runtime accepts to see if my engine supports at least everything in the runtime. It turns out no. It does not handle the "!" operator. But, I've been looking at this more closely. It does not appear "!" expressions that are given in the doc are actually legal xpath, at least not version 3.1, e.g., "/expr/!primary". The expression should use expression filters with the "not" function, e.g., "/expr/*[not(self::primary)]", which my engine supports. I looked at the spec, and three xpath online engines, none accept the "!" operator this way.
The XPath spec does not define ! as an XPath operator.
The valid XPath operators are | (set union), / and // (path composition), =, !=, <=, <, >=, and > (comparison), +, -, div, mod, and * (arithmetic). Note that there aren't any symbol-based boolean operators - XPath uses and and or, instread of the more common & and |, and, as you point out, has the not() function instead of a boolean-not operator.
@RossPatterson I think my point with this Github Issue is that the Antlr XPath implementation is non-standard. It doesn't implement XPath1, let alone XPath2, and it's not implemented across all targets. It should be removed and, if possible, replaced with something much more useful.
I use XPath2 for all my "greps" across the parse tree, e.g., in identifying direct left recursion in an Antlr grammar, for making certain lexer rules as "fragment" rules, and for picking off all string literals in a grammar so as to create a list of lexer rules recognizing those keywords.
Moving beyond "grepping", I'm implementing tree rewrite via a ported XSLT engine. With that, I could then solve problems like rewriting a '?' operator and converting hardwired code initialization into a JSON declaration.
I'd rather see effort in fixing the Antlr tree, token, and char stream data structures. They should be redesigned to make it easier to edit the tree. In all my tools, I have to convert Antlr's data structures to my own. Antlr visitors, listeners, and token rewrite are inadequate refactoring tools because reparsing may not be viable after a refactoring, e.g., a rewrite into an AST. A tree must be maintained.