ignore_small_changes on BigQuery
Hello, this is copy of my message on slack
Platform: BigQuery
I have problem with the ignore_small_changes params, when I try to set the spike_failure_percent_threshold to 2% I get a very strange result on the threshold range in my volume anomalie test, it's literally a 2-fold increase in the training_avg value. So after a little research, it seems that this i caused by a problem of calculation priority, let me explain : When max value for volume anomalie test is calculated via this function : get_anomaly_scores_query.sql (tell me if I'm saying something wrong) The max value is computed with the following code :
{%- set max_val -%}
{{ test_configuration.anomaly_sensitivity }} * training_stddev + training_avg
{%- endset -%}
{% if test_configuration.ignore_small_changes.spike_failure_percent_threshold %}
{%- set spike_avg_threshold -%}
((1 + {{ test_configuration.ignore_small_changes.spike_failure_percent_threshold }}/100.0) * training_avg)
{%- endset -%}
{%- set max_val -%}
{{ elementary.arithmetic_max(spike_avg_threshold, max_val) }}
{%- endset -%}
{% endif %}
and the elementary.arithmetic_max function is used. This function is relatively simple and returns, as it's name suggets, the largest value between spike_avg_threshold and max_val (i.e. f(x,y)=0,5(x+y+∣x−y∣))
If we now look at the final calculation in a SQL query on BigQuery, we obtain the following field :
(0.5 * (((1 + 100/100.0) * training_avg) + 2 * training_stddev + training_avg + abs(((1 + 100/100.0) * training_avg) - 2 * training_stddev + training_avg))) end as max_metric_value,
Where spike_avg_threshold = ((1 + 100/100.0) * training_avg) = x and max_val = 2 * training_stddev + training_avg = y
But we can now easly see that we are going to have problem with the calcul because of the "2 * training_stddev + training_avg" part, in fact, the multiplication will apply only on the training_stddev value and wait the addition priority to sum the training_avg value "at the end" of this part of the calcul. This will make the calculation false and return a result other than the one expected, in this case, the highest value between spike_avg_threshold and max_val.
For exemple, if I set the spike_failure_percent_threshold: 2 I get an acceptance margin multiplied by 2 (100%) and not just 2%.
Logically, it would "be enough" to add parentheses to this problematic part to resolve the error, but I wanted to explain it to you to make sure I hadn't missed anything or made a mistake.
I'm available if my explanation isn't clear. Good day to you.
Hi @Emmanuel2RPA ! Thanks for opening this and for the detailed explanation, I believe this bug was actually fixed here, though we haven't released it in an official version just yet, but we'll do so soon. There were indeed missing parentheses.