presto icon indicating copy to clipboard operation
presto copied to clipboard

Fix inconsistent ordering with offset and limit

Open xieandrew opened this issue 6 months ago • 1 comments

Description

Fixes inconsistent output order when using offset and limit together by passing through parent StreamPreferredProperties in AddLocalExchanges. This prevents visitLimit from overriding the parent orderSensitive field, which currently causes the loss of ordering. This change causes order-breaking Round Robin exchanges to be removed from the logical plan.

An implementation of visitOffset in PlanPrinter was also added so that debugging with the session property verbose_optimizer_info_enabled will not cause "not yet implemented" errors.

Motivation and Context

Fixes #25071.

Impact

The offset clause when used with limit and order by will now produce accurate results.

Test Plan

I ran the following test query both before and after this change on the HiveQueryRunner: select c_customer_id from hive.tpcds.customer order by c_customer_id offset 1 limit 800;

Without this change, the query produces completely different output when retried multiple times. With this change, the query returns consistent output every time. I also compared the logical plans using explain (type logical) and the Round Robin exchanges were removed after this change.

Contributor checklist

  • [x] Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • [x] If release notes are required, they follow the release notes guidelines.
  • [x] Adequate tests were added if applicable.
  • [ ] CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix inconsistent ordering with offset and limit

xieandrew avatar May 28 '25 16:05 xieandrew

I added the requested unit test for checking result consistency over 5 runs. The minimum limit value required is 256 as the bug needs multiple pages to trigger. This test passes with this change and fails without this change (failed 50/50 times with 5 query runs per test run, max query runs required to fail was 4). Not sure where to specify query runner parallelism but the bug can be reproduced already anyways. Let me know if I need to move this test to a different package/class.

xieandrew avatar May 28 '25 19:05 xieandrew

An implementation of visitOffset in PlanPrinter was also added so that debugging with the session property verbose_optimizer_info_enabled will not cause "not yet implemented" errors.

I don't find verbose_optimizer_info_enabled in the Presto documentation, and I don't think it's appropriate to add that doc requirement to this PR.

Should I open a doc issue for the missing session property?

steveburnett avatar Jun 25 '25 17:06 steveburnett

I'm not sure what the policy is on documenting session properties is since there are a lot of missing ones, though this one is only used for debugging during development. Separately, it may be important to document the offset_clause_enabled session prop and offset-clause-enabled configuration prop since those are needed by the user to actually use the main feature this fix relates to. I only see those properties in a release note: https://prestodb.io/docs/current/release/release-0.257.html#general-changes

xieandrew avatar Jun 25 '25 18:06 xieandrew

I'm not sure what the policy is on documenting session properties is since there are a lot of missing ones, though this one is only used for debugging during development. Separately, it may be important to document the offset_clause_enabled session prop and offset-clause-enabled configuration prop since those are needed by the user to actually use the main feature this fix relates to. I only see those properties in a release note: https://prestodb.io/docs/current/release/release-0.257.html#general-changes

The policy is that they should all be documented but many never were. I am trying to address that as I notice individual gaps until I can plan how to catch up on all of them.

Thank you for the help you gave by identifying offset_clause_enabled session prop and offset-clause-enabled configuration properties! I will open a doc issue for these missing properties.

steveburnett avatar Jun 25 '25 18:06 steveburnett

@ZacBlanco could I get a final review on this?

xieandrew avatar Jul 10 '25 19:07 xieandrew