piped-processing-language icon indicating copy to clipboard operation
piped-processing-language copied to clipboard

[RFC] Tenets of PPL command

Open penghuo opened this issue 9 months ago • 1 comments

Backgrond

PPL is designed as a sequence of commands chained together using the pipe operator (|). Each command takes a dataset as input and outputs a refined dataset for the next command. The query begins with a data source and flows through various transformation operators, much like a funnel that progressively filters, rearranges, or summarizes data.

Publishing this RFC is essential to establish clear, standardized principles for developing new PPL commands. By formalizing these tenets, we ensure consistency across the language, reduce integration issues, and improve error handling. This guidance also supports contributors and maintainers by providing a common framework to build upon, ensuring that every command meets high standards for compatibility, robustness, and usability.

Tenets of PPL command

  • Compatibility: Each command must accept a dataset as input and output a dataset, seamlessly integrating into the sequential pipe workflow.
  • Well-Defined Schema: Clearly specify the expected input fields and the resulting output fields. Validate that all referenced fields exist; otherwise, produce a clear, immediate error.
  • Robust Error Handling: The command should enforce semantic rules by checking for the existence of referenced fields and correct parameter usage, failing early with descriptive error messages if any issues arise.
  • Parameter Integrity: Required and optional parameters must be unambiguously defined.
  • Thorough Documentation and Testing: Update relevant documentation and include examples that illustrate usage. Comprehensive tests (both unit and integration) must verify that the command behaves as expected under various conditions.

penghuo avatar Feb 20 '25 17:02 penghuo

Catch All Triage - 1

andrross avatar Mar 10 '25 16:03 andrross