piped-processing-language [RFC] Unified PPL Data Type

[RFC] Unified PPL Data Type

Open penghuo opened this issue 9 months ago • 0 comments

Is your feature request related to a problem?

Current State: Fragmented Data Type Systems in PPL Engines

Query engines such as OpenSearch PPL and Spark PPL employ distinct data type systems, creating interoperability challenges in multi-engine environments. Key examples include:

Type Name Mismatches: OpenSearch PPL defines INTEGER (string representation: integer).Spark PPL uses IntegerType (string representation: int). Despite representing semantically equivalent 32-bit signed integers, the syntactic inconsistency disrupts cross-engine workflows.
Engine-Specific Types: OpenSearch PPL introduces specialized types like IP and GEO_POINT, which lack native equivalents in other engines.

Impact:

Integration Issues: Tools like OpenSearch Dashboards face parsing errors or misaligned visualizations when processing results from engines with mismatched type systems.
Manual Overhead: Users must rewrite queries or cast types explicitly when migrating between engine

What solution would you like?

To eliminate friction and ensure seamless interoperability, all PPL-compliant engines should adopt a common data type system with the following principles:

Standardized Type Names, Universal type names and string representations (e.g., int instead of INTEGER or IntegerType).
Semantic Consistency, Equivalent types (e.g., 32-bit integers) must behave identically in syntax, casting rules, and operations (e.g., arithmetic, comparisons). Engine-specific types (e.g., ip, geo_point) should be opt-in extensions with clear documentation.
Interoperability Guarantee Queries and schemas written for one engine should execute seamlessly on others without manual adjustments.

Do you have any additional context?

ZetaSQL common data type. https://github.com/google/zetasql/blob/master/docs/data-types.md
OpenSearch PPL data type. https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/general/datatypes.rst

Feb 17 '25 21:02 penghuo

piped-processing-language piped-processing-language copied to clipboard

[RFC] Unified PPL Data Type

Is your feature request related to a problem?

What solution would you like?

Do you have any additional context?

piped-processing-language
piped-processing-language copied to clipboard