presto
presto copied to clipboard
Remove the newer SqlQueryScheduler
Remove the "new" SqlQueryScheduler, which was created to support stage retries, but was never rolled out and is no longer important because of Presto-on-Spark. Rename LegacySqlQueryScheduler to SqlQueryScheduler.
Description
Remove the SqlQueryScheduler and rename LegacySqlQueryScheduler back to SqlQueryScheduler. Remove corresponding session properties that are no longer relevant.
Motivation and Context
The new SqlQueryScheduler, which was built to enable stage retries when materializing exchanges has never been rolled out, and has a bug that causes it to use a lot of cpu that has not been resolved (was disabled here, https://github.com/prestodb/presto/pull/14264 and an attempt to fix it here https://github.com/prestodb/presto/pull/14879 did not resolve the issue).
The new scheduler was deprioritized with the investment in presto-on-spark. This PR removes the "new" scheduler, which has been tech debt for a while since we are not investing in fixing it and making it stable for production.
The LegacySqlQueryScheduler has been the default, so there is no user facing change except for removing the ability to try out the new scheduler.
Fixes https://github.com/prestodb/presto/issues/21886
Impact
Removes ability to use the experimental SqlQueryScheduler that allowed for stage retries
Test Plan
CI
Contributor checklist
- [X] Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
- [X] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
- [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
- [x] If release notes are required, they follow the release notes guidelines.
- [x] Adequate tests were added if applicable.
- [ ] CI passed.
Release Notes
Please follow release notes guidelines and fill in the release notes below.
== RELEASE NOTES ==
General Changes
* Remove the configuration property ``use-legacy-scheduler`` and the corresponding session property ``use_legacy_scheduler`. The property previously defaulted to true, and the new scheduler, which was intended to replace it eventually, was never productionized and is no longer needed. The configuration property ``max-stage-retries`` and the session property ``max_stage_retries`` have also been removed.
If it is a useful/functional piece we should keep it. Who knows maybe we will find use for it!
If it is a useful/functional piece we should keep it. Who knows maybe we will find use for it!
The problem is that it's buggy and it's not clear how much effort it would take to be confident enough to use it in production. If the feature were important for us now, i would 100% say let's keep it and take it that last mile. But it's not really a priority, so in the meantime it's tech debt that makes it harder to add features to the scheduler (you need to add everything in two places in case one day someone decides we should actually productionize and use the new scheduler).
If it comes in handy later, we can simply recover it from Git history. I agree with @rschlussel that there's costs to maintaining this class, and while for this one class it might be small, dead code over the whole project it is a significant technical burden and I think we should err on the side of removing unused functionality.