piped-processing-language icon indicating copy to clipboard operation
piped-processing-language copied to clipboard

[RFC] Unified PPL Data Type

Open penghuo opened this issue 9 months ago • 0 comments

Is your feature request related to a problem?

Current State: Fragmented Data Type Systems in PPL Engines

Query engines such as OpenSearch PPL and Spark PPL employ distinct data type systems, creating interoperability challenges in multi-engine environments. Key examples include:

  • Type Name Mismatches: OpenSearch PPL defines INTEGER (string representation: integer).Spark PPL uses IntegerType (string representation: int). Despite representing semantically equivalent 32-bit signed integers, the syntactic inconsistency disrupts cross-engine workflows.
  • Engine-Specific Types: OpenSearch PPL introduces specialized types like IP and GEO_POINT, which lack native equivalents in other engines.

Impact:

  • Integration Issues: Tools like OpenSearch Dashboards face parsing errors or misaligned visualizations when processing results from engines with mismatched type systems.
  • Manual Overhead: Users must rewrite queries or cast types explicitly when migrating between engine

What solution would you like?

To eliminate friction and ensure seamless interoperability, all PPL-compliant engines should adopt a common data type system with the following principles:

  • Standardized Type Names, Universal type names and string representations (e.g., int instead of INTEGER or IntegerType).
  • Semantic Consistency, Equivalent types (e.g., 32-bit integers) must behave identically in syntax, casting rules, and operations (e.g., arithmetic, comparisons). Engine-specific types (e.g., ip, geo_point) should be opt-in extensions with clear documentation.
  • Interoperability Guarantee Queries and schemas written for one engine should execute seamlessly on others without manual adjustments.

Do you have any additional context?

  • ZetaSQL common data type. https://github.com/google/zetasql/blob/master/docs/data-types.md
  • OpenSearch PPL data type. https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/general/datatypes.rst

penghuo avatar Feb 17 '25 21:02 penghuo