piped-processing-language
piped-processing-language copied to clipboard
[RFC]PPL Extension Mechanism
Is your feature request related to a problem?
PPL is a language which has multiple commands, it has the ability to be used in different engines that have a large variety of use cases and functionality.
In order for PPL to utilize these capacities to their full extent , PPL has to be dynamically extended with specific commands which are either domain specific or execution specific.
For example:
- Security / Observability should have a vocabularies that are distinct to these domains and allow the PPL user to freely use them in the statements.
- Geospatial / Textual should also provide text/geospatial specific capability designed for usage within a text search based engine or a Geospatial search engine
Such vocabulary should not become part of the standard PPL language due to its specific use case or specific domain usage that is not relevant to other use cases.
What solution would you like?
The PPL grammar is to provide extension point to enable other plugin define their command name, paramaters, output schema.
Backend plugins can also extend resulting data types and functions, making them accessible through PPL data types and functions.
Do you have any additional context?
- https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/cmd/ml.rst
- https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/cmd/ml.rst
Extending ANTLR grammer
To dynamically extend a language defined using ANTLR without introducing new code, we can consider a modular grammar design. ANTLR allows to create a grammar in parts and merge these parts dynamically by including external grammar files at runtime.
-
Grammar Inclusion ANTLR supports grammar file modularization through import statements, where new rules can be added via partial grammars. However, ANTLR generates code at compile time, so dynamic discovery of grammar components requires pre-compiling these fragments.
-
Dynamic Loading with Delegated Parsers We could dynamically load additional rules or new syntactical constructs using delegated parsers. The core grammar would handle the base syntax, while extension grammars could handle specific constructs or operations dynamically.
OpenSearchPPLLexer.g4
OpenSearchPPLParser.g4
/extensions
/grammar
extensionParser1.g4 # Additional grammar rules for an extension
extensionLexer1.g4 # Additional grammar rules for an extension
/compiled
Extension1Lexer.class # Compiled Lexer for extension1
Extension1Parser.class
Extension1BaseVisitor.class
Extension1BaseListener.class
Workflow for Extension Implementors
Define Grammar Rules: Extension implementors create new grammar files (e.g., extension1.g4).
Validate Grammar:
Implementors use a validation tool (provided by our utilities) to check:
validate-grammar extension1.g4
Grammar correctness. Rule compliance with base grammar (e.g., naming conflicts, syntactical integrity, security).
antlr4 -o /compiled/ extensionParser1.g4
javac -d /compiled/ Extension1*.java
Main ANTLR Grammar (Root Folder)
The main grammar should include: A Rule for Commands: A general rule that includes core commands and allows for extensible commands:
grammar MainGrammar;
query: command+; // Queries are composed of commands.
command
: standard_command
| extensible_command
;
standard_command
: ...
| ...
;
extensible_command
: EXTENSIBLE_COMMAND // Placeholder for extension commands.
;
EXTENSIBLE_COMMAND
: .+? // Match unknown commands dynamically (overridden by extensions).
;
Hook for Future Rules:
The extensible_command rule is a hook that expects extensions to replace or augment it with specific rules.
Extension Grammar Compatibility:
The EXTENSIBLE_COMMAND is a catch-all placeholder rule. When extensions are loaded, this rule is overridden to parse specific commands provided by the extensions.
Example Extension Grammar (/extensions/grammar/extensionParser1.g4) The extension defines its additional command:
grammar ProjectView;
extensible_command
: PROJECT_VIEW_COMMAND
;
PROJECT_VIEW_COMMAND
: 'PROJECT' 'VIEW' 'AS' 'SELECT' '*' 'FROM' IDENTIFIER
;
How This Works
- The Base Grammar:
The base ANTLR grammar (OpenSearchPPLParser.g4) defines a flexible extensible rule (extensible_command).
It is designed to allow unknown constructs but doesn’t itself define specifics.
- Extension Grammar:
Extensions redefine or extend the extensible_command rule, introducing specific syntax like PROJECT VIEW AS SELECT * FROM table.
- Runtime Loading:
The ANTLR runtime combines the base grammar with extension grammars at runtime.
- Dynamic Behavior:
When the PROJECT VIEW AS SELECT * FROM table query is encountered, the dynamically loaded extension grammar handles the parsing.
[Catch All Triage - 1, 2, 3]