airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Fix incorrect data interval alignment due to assumption on input time alignment

Open wanlce opened this issue 2 years ago • 5 comments

This introduces a new _align_to_prev util function for cron-based interval alignment. Previously we assumed _get_prev(_align_to_next(t)) is enough for this purpose, but this proved incorrect (see discussions in #21715), and an additional alignment logic is needed to avoid the inferred interval being off by one period when the input value has unexpected alignment (unexpectedly aligning for a manual run, and unexpected not aligning for an auto one).

See also #21715 Fix #23689

wanlce avatar Mar 31 '22 16:03 wanlce

I am very sorry, because my wrong rebase caused the old request to be closed. :sob: I wasn't sure where these changes needed to be documented, thank you so much for your help!

wanlce avatar Mar 31 '22 16:03 wanlce

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

github-actions[bot] avatar Apr 12 '22 07:04 github-actions[bot]

Anyone else wants to take a look at this one? It’s a relatively straightforward change, since we established that this additional method is needed.

uranusjr avatar Aug 08 '22 06:08 uranusjr

I think what woudl be great here is to add a few more tests showing the different cases and maybe explaining the context in the test a bit better?

The case here is either not tested with the edge case or it's not clear that it is tested I think:

    def _align_to_prev(self, current: DateTime) -> DateTime:
        """Get the prev scheduled time.
        This is ``current - interval``, unless ``current`` falls right on the
        interval boundary, when ``current`` is returned.
        """
        prev_time = self._get_prev(current)
        if self._get_next(prev_time) != current:
            return prev_time
        return current

potiuk avatar Aug 08 '22 07:08 potiuk

Actually this is also the cause of #23689. I’ve added a few test cases here to illustrate those issues, and modified the PR description to better explain the problems this solve.

uranusjr avatar Aug 09 '22 09:08 uranusjr