etna
etna copied to clipboard
Make `determine_num_steps` more optimal
🚀 Feature Request
Currenlty, determine_num_steps
uses linear probing that results in $O(n^2)$ time. We can reduce it to $O(n)$ by making a next step from the last tried point instead of beginning.
Proposal
def determine_num_steps(start_timestamp: pd.Timestamp, end_timestamp: pd.Timestamp, freq: str) -> int:
if start_timestamp > end_timestamp:
raise ValueError("Start train timestamp should be less or equal than end timestamp!")
# check if start_timestamp is normalized
normalized_start_timestamp = pd.date_range(start=start_timestamp, periods=1, freq=freq)
if normalized_start_timestamp != start_timestamp:
raise ValueError(f"Start timestamp isn't correct according to given frequency: {freq}")
# check a simple case
if start_timestamp == end_timestamp:
return 0
# make linear probing, because for complex offsets there is a cycle in `pd.date_range`
cur_timestamp = start_timestamp
cur_value = 1
while True:
timestamps = pd.date_range(start=cur_timestamp, periods=2, freq=freq)
if timestamps[-1] == end_timestamp:
return cur_value
elif timestamps[-1] > end_timestamp:
raise ValueError(f"End timestamp isn't reachable with freq: {freq}")
cur_timestamp = timestamps[-1]
cur_value += 1
Test cases
Make sure current tests pass.
Additional context
No response