piped-processing-language icon indicating copy to clipboard operation
piped-processing-language copied to clipboard

[FEATURE]Move ANTLR + Docs

Open YANG-DB opened this issue 1 year ago • 4 comments

Is your feature request related to a problem?

This first step of consolidating the PPL spec into a single repository which will be deployed as an independent artifact.

What solution would you like?

First step:

  • add artifacts for a java gradle project
  • add the PPL & SQL ANTLR specs
  • add the PPL & SQL docs

#23 - Main RFC

Do you have any additional context?

  • https://github.com/opensearch-project/sql/tree/main/docs
  • https://github.com/opensearch-project/sql/tree/main/ppl/src/main/antlr
  • https://github.com/opensearch-project/opensearch-spark/tree/main/docs/ppl-lang
  • https://github.com/opensearch-project/opensearch-spark/tree/main/ppl-spark-integration/src/main/antlr4

YANG-DB avatar Nov 27 '24 20:11 YANG-DB

@YANG-DB On high-level that make sense, but I reckon there are few questions should be answered before anyone jump into the implementation:

Versioning

This ANTLR grammar artifact will have its own versioning, which is separate from the OpenSearch version, is that correct?

One of more grammar || command syntax will be added or modified incrementally on each release. Then each driver will work its own platform specific implementation and bump the artifact version accordingly to support new syntax when it's ready.

Syntax gap At the moment we have a gap on syntax and supported command between the SQL plugin v.s the Flint implementation on Spark, are we planning to resolve the conflicts on this ticket, or we simply extract all the identical bit and make it as v1.0.0?

If this is the first task of the EPIC, I prefer to simply extract all the identical bits and mark it as v1.0.0

andy-k-improving avatar Nov 28 '24 18:11 andy-k-improving

@YANG-DB On high-level that make sense, but I reckon there are few questions should be answered before anyone jump into the implementation:

Versioning

This ANTLR grammar artifact will have its own versioning, which is separate from the OpenSearch version, is that correct?

One of more grammar || command syntax will be added or modified incrementally on each release. Then each driver will work its own platform specific implementation and bump the artifact version accordingly to support new syntax when it's ready.

Syntax gap At the moment we have a gap on syntax and supported command between the SQL plugin v.s the Flint implementation on Spark, are we planning to resolve the conflicts on this ticket, or we simply extract all the identical bit and make it as v1.0.0?

If this is the first task of the EPIC, I prefer to simply extract all the identical bits and mark it as v1.0.0

@andy-k-improving yes you are correct - we will first finalize as much as we can in the issues prior starting the development.

Versioning: We will support semantic versioning for PPL ANTLR but not for Spark/OpenSearch, I prefer we create another mechanism to allow utilizing the general PPL command with each engine using its own dialect. We can take inspiration from other general purpose query engines such as calcite for having a specific dialect for a general PPL command.

So to summarize the dialect file will be another resource that can be versioned if needed.

Syntax gap: We will move in parallel

  • One thread is to just move all the PPL related resources into this repo
  • The other thread will be to continue and fill the ppl commands gap in openseaerch.

This will simplify the development process and not block any of the threads, once the openseaerch commands thread is stable to a degree we can call ready we will change the needed code so that both engines will start using the new consolidated ANTLR resources.

YANG-DB avatar Nov 28 '24 20:11 YANG-DB

[Catch All Triage - 1, 2, 3]

dblock avatar Dec 16 '24 17:12 dblock

I prefer we create another mechanism to allow utilizing the general PPL command with each engine using its own dialect.

Agree, we do have OpenSearch-specific commands for ML, AD, and Alerting that are not available in Spark PPL, but introduction of dialects constrain the unified PPL approach. I am more interested in supporting a standard PPL, similar to ANSI SQL.

penghuo avatar Feb 14 '25 16:02 penghuo