presto icon indicating copy to clipboard operation
presto copied to clipboard

Restrict the amount of data written during CTE Materialization

Open jaystarshot opened this issue 1 year ago • 0 comments

Description

There must be a limit to restrict the volume of data generated by materialized CTE queries to prevent poorly designed queries from consuming excessive storage space and unrealistic compute.

This PR adds limit on the writtenIntermediateBytes metric and fails query if this limit is exceeded. We extend the existing background check being done every second. We also makes sure that this is used only in context of cte materialization.

Motivation and Context

Impact

Test Plan

Unit tests + tested on a cluster

Contributor checklist

  • [ ] Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • [ ] If release notes are required, they follow the release notes guidelines.
  • [ ] Adequate tests were added if applicable.
  • [ ] CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add limit to the amount of data written during CTE Materialization. This is configurable by the session property ``query_max_written_intermediate_bytes`` (default is 2TB)



jaystarshot avatar Feb 27 '24 05:02 jaystarshot