sql icon indicating copy to clipboard operation
sql copied to clipboard

[RFC] Standardizing OSD–PPL Interaction

Open penghuo opened this issue 2 months ago • 2 comments

Problem Statements

  • Incomplete language awareness in OSD typeahead, OSD cannot reliably detect the exact PPL version and supported commands/functions for a connected cluster or engine. Lack of a machine-readable, versioned capabilities contract leads to incorrect or missing completions, stale suggestions, and confusing editor diagnostics.
  • Unreliable “raw events before first reduce” on the Events page: OSD is expected to display raw events up to (but not including) the first reducing operator (e.g., dedup, stats). Without an authoritative definition or introspection of reducing boundaries, OSD uses heuristics that fail on complex pipelines (subqueries, macros, mvexpand, field mutations), causing incorrect previews and user confusion.
  • Ambiguity in “raw events” for the Timeline page: The Timeline page should visualize counts over raw events using an effective timestamp field before any reducing step. OSD lacks a consistent, query-aware definition of “raw events”.
  • No first-class cancellation for long-running PPL queries: Frontend needs to cancel in-flight or superseded queries (typing, tab switches, navigation). Lack of a Cancel API and stable query identifiers prevents responsive UX, wastes cluster resources, and complicates concurrency control in OSD.

penghuo avatar Oct 13 '25 21:10 penghuo

We might need to decouple PPL with it's own versioning in order to achieve this

anasalkouz avatar Oct 13 '25 21:10 anasalkouz

Hi @penghuo, I'd like to discuss two potential enhancements for the PPL API

Multi-Search API for PPL

  • Could we implement a PPL equivalent to OpenSearch's _msearch API? (Reference)
  • This would enable batching multiple PPL queries in a single request
  • Use case: Dashboard visualizations where multiple widgets each run their own PPL query
  • Currently, we can batch DSL searches, but there's no equivalent for PPL queries

Partial Bucket Indicator

  • Could we add metadata to identify "partial" buckets in aggregation results?
  • Example query:
source = opensearch_dashboards_sample_data_logs
| WHERE timestamp >= '2025-07-31 14:58:04.724' AND timestamp <= '2025-08-01 02:59:18.664' 
| STATS avg(bytes) by span(timestamp, 3h), extension
  • The bucket 2025-07-31 12:00:00 contains partial data due to the time range filter
  • This would allow UIs to visually indicate incomplete buckets in visualizations

ruanyl avatar Oct 29 '25 07:10 ruanyl