druid icon indicating copy to clipboard operation
druid copied to clipboard

Validate datasource retention rules: Reject rules that fully contain subsequent rules' interval

Open abhishekrb19 opened this issue 2 years ago • 5 comments

Druid's retention rules are evaluated in the order in which they are set. Unfortunately, a mistake in the order of rules by an operator can lead to unintentional data loss, and if it goes unnoticed and auto-kill is enabled, data could be permanently deleted.

To address this issue, a guard rail is added in the rules API. This guard rail aims to identify rules that fully contains the interval for any subsequent rules A rule is considered good as long as it will run at some point (and doesn't fully hide other rules in the chain) - it's ensured by checking at the interval boundaries as well.

Some examples of bad rules that will be disallowed (see unit tests for more examples):

  • [loadByPeriod(P6M), loadByPeriod(P3M)]
  • [loadByInterval(2020-01-01/2050-01-01), loadByInterval(2021-01-01/2023-01-01)]
  • [dropBeforeByPeriod(P3M), dropBeforeByPeriod(P6M)]
  • [loadForever, loadByPeriod(P1M)]
  • [loadForever, dropForever]

Examples of rules that will be allowed because they will be evaluated at some point:

  • [loadByPeriod(P5Y), loadByInterval(2021-01-01/2023-01-01)]; the loadByInterval rule will be evaluated at some point.
  • [loadByPeriod(P1Y), loadByPeriod(P2Y)]

For testing purposes, if a user wants to intentionally set a rule that fully contains other rules, they can do it temporarily and then fetch the correct set of rules from the rules audit history after their testing is done.

Release note

Datasource rules API will reject the rules payload if a rule fully contains subsequent rules' interval.


Key changed/added classes in this PR
  • Rule.java
  • Rules.java
  • RulesTest.java
  • RulesEligibleIntervalTest.java

This PR has:

  • [x] been self-reviewed.
  • [x] a release note entry in the PR description.
  • [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • [x] been tested in a test Druid cluster.

abhishekrb19 avatar Sep 19 '23 21:09 abhishekrb19

@abhishekrb19 , it seems like a nice idea to have these checks while updating retention rules.

If there is a cluster that already defines rules that would fail the check, would that cluster break after this change? Edit: I suppose not since the check is enforced only when updating rules.

kfaraz avatar Sep 20 '23 03:09 kfaraz

Thanks for the review, @kfaraz. Re:

If there is a cluster that already defines rules that would fail the check, would that cluster break after this change?

No, this is an API-only validation change, so any existing rules in a cluster will remain as-is.

abhishekrb19 avatar Sep 20 '23 04:09 abhishekrb19

To allow for the introduction of new rules and cases where the user does want to add these absurd rules that have been supported till now, WDYT of having a URL parameter ?forced which bypasses these validation checks. The new check would be the default, however, if someone wants to override it, they can set the flag and bypass them.

LakshSingla avatar Sep 21 '23 07:09 LakshSingla

@LakshSingla, re:

where the user does want to add these absurd rules that have been supported till now, WDYT of having a URL parameter ?forced which bypasses these validation checks.

hmm, a parameter to bypass API validation seems a bit hacky to me. If users want to "temporarily" set something for testing or so, they can set effective rules and then always retrieve the "actual" rules from the audit history from the console/API. I think that should suffice?

abhishekrb19 avatar Sep 27 '23 05:09 abhishekrb19

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Mar 05 '24 00:03 github-actions[bot]

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions[bot] avatar Apr 03 '24 00:04 github-actions[bot]