etna icon indicating copy to clipboard operation
etna copied to clipboard

Make `determine_num_steps` more optimal

Open Mr-Geekman opened this issue 1 year ago • 0 comments

🚀 Feature Request

Currenlty, determine_num_steps uses linear probing that results in $O(n^2)$ time. We can reduce it to $O(n)$ by making a next step from the last tried point instead of beginning.

Proposal

def determine_num_steps(start_timestamp: pd.Timestamp, end_timestamp: pd.Timestamp, freq: str) -> int:
    if start_timestamp > end_timestamp:
        raise ValueError("Start train timestamp should be less or equal than end timestamp!")

    # check if start_timestamp is normalized
    normalized_start_timestamp = pd.date_range(start=start_timestamp, periods=1, freq=freq)
    if normalized_start_timestamp != start_timestamp:
        raise ValueError(f"Start timestamp isn't correct according to given frequency: {freq}")

    # check a simple case
    if start_timestamp == end_timestamp:
        return 0

    # make linear probing, because for complex offsets there is a cycle in `pd.date_range`
    cur_timestamp = start_timestamp
    cur_value = 1
    while True:
        timestamps = pd.date_range(start=cur_timestamp, periods=2, freq=freq)
        if timestamps[-1] == end_timestamp:
            return cur_value
        elif timestamps[-1] > end_timestamp:
            raise ValueError(f"End timestamp isn't reachable with freq: {freq}")
        cur_timestamp = timestamps[-1]
        cur_value += 1

Test cases

Make sure current tests pass.

Additional context

No response

Mr-Geekman avatar Aug 12 '22 15:08 Mr-Geekman