Anomaly detection. Detection period value is included in the calculation of the AVG value for the training period
Describe the bug
We had a case recently when a monitored value increased by around 25%, but anomaly detection test secceeded with anomaly_sensitivity parameter set to 2, anomaly_direction to both.
It was also unclear where Elementary takes the AVG value from. After some research, I found out that when calculating the AVG value, along with the traning period values, the detection period value is also taken into account.
As a result, the value to be examined affected the AVG value. I am not sure if this is the correct behavour.
In the picture, the value of TRAINING_AVG , highlighted in yellow, is calculated as the AVG for the 3 previous results PLUS the current one:
There may be an inaccuracy in the window function. I also found similar topic in the Slack channel
Expected behavior Elementary anomaly detection test failes.
Environment (please complete the following information):
- Elementary dbt package version: 0.15.2
- Data warehouse: Snowflake
Hello there, and thanks for providing this great package!
We've ran into the same situation and conclusion about excluding the detection period from the training dataset. From a prediction perspective, training on the detection period can have a big impact on the anomaly score, and therefore affect the results (false positives / false negatives). I believe it's a common practice to guarantee that train / test sets are fully separated so that no test data "leaks" into the model and biases it.
This issue is probably the same as this one BTW: https://github.com/elementary-data/elementary/issues/1491
Are you planning to address this issue in the near future? Many thanks!