Enhancement/validations on update

Open janrth opened this issue 2 months ago • 1 comments

Tries to solve #358

Validates that the update df has the expected shape so that each unique_id starts from the last ds as seen in the previous df and contains the expected number of ds.

For each unique_id the number of ds date points are counted from the observed update df and this is then compared to the expected number of date points, which is calculated by the estimated start and end date. The estimated start date is observed based on the stored series last date + offset(freq).

There is an option to turn off the validate_input step. While overall the performance is pretty fast, it might be a bit annoying if one has hundreds of millions of rows.

Initially I started just checking if the first ds of the update is in the future for each unique_id, but then I felt this is not checking much really and started to implement a stronger logic. The issue itself is a bit vague and I am open for any changes as I implemented based on my interpretation of the task.

Description

Tries to implement more checks on update df

Checklist:

[x] This PR has a meaningful title and a clear description.
[x] The tests pass.
[x] All linting tasks pass.
[x] The notebooks are clean.

Dec 14 '25 19:12 janrth