elementary icon indicating copy to clipboard operation
elementary copied to clipboard

ignore_small_changes on BigQuery

Open Emmanuel2RPA opened this issue 1 year ago • 1 comments

Hello, this is copy of my message on slack

Platform: BigQuery

I have problem with the ignore_small_changes params, when I try to set the spike_failure_percent_threshold to 2% I get a very strange result on the threshold range in my volume anomalie test, it's literally a 2-fold increase in the training_avg value. So after a little research, it seems that this i caused by a problem of calculation priority, let me explain : When max value for volume anomalie test is calculated via this function : get_anomaly_scores_query.sql (tell me if I'm saying something wrong) The max value is computed with the following code :

    {%- set max_val -%}
      {{ test_configuration.anomaly_sensitivity }} * training_stddev + training_avg
    {%- endset -%}
    {% if test_configuration.ignore_small_changes.spike_failure_percent_threshold %}
      {%- set spike_avg_threshold -%}
        ((1 + {{ test_configuration.ignore_small_changes.spike_failure_percent_threshold }}/100.0) * training_avg)
      {%- endset -%}
      {%- set max_val -%}
        {{ elementary.arithmetic_max(spike_avg_threshold, max_val) }}
      {%- endset -%}
    {% endif %}

and the elementary.arithmetic_max function is used. This function is relatively simple and returns, as it's name suggets, the largest value between spike_avg_threshold and max_val (i.e. f(x,y)=0,5(x+y+∣x−y∣))

If we now look at the final calculation in a SQL query on BigQuery, we obtain the following field : (0.5 * (((1 + 100/100.0) * training_avg) + 2 * training_stddev + training_avg + abs(((1 + 100/100.0) * training_avg) - 2 * training_stddev + training_avg))) end as max_metric_value,

Where spike_avg_threshold = ((1 + 100/100.0) * training_avg) = x and max_val = 2 * training_stddev + training_avg = y But we can now easly see that we are going to have problem with the calcul because of the "2 * training_stddev + training_avg" part, in fact, the multiplication will apply only on the training_stddev value and wait the addition priority to sum the training_avg value "at the end" of this part of the calcul. This will make the calculation false and return a result other than the one expected, in this case, the highest value between spike_avg_threshold and max_val. For exemple, if I set the spike_failure_percent_threshold: 2 I get an acceptance margin multiplied by 2 (100%) and not just 2%.

Logically, it would "be enough" to add parentheses to this problematic part to resolve the error, but I wanted to explain it to you to make sure I hadn't missed anything or made a mistake.

I'm available if my explanation isn't clear. Good day to you.

Emmanuel2RPA avatar Jun 05 '24 14:06 Emmanuel2RPA

Hi @Emmanuel2RPA ! Thanks for opening this and for the detailed explanation, I believe this bug was actually fixed here, though we haven't released it in an official version just yet, but we'll do so soon. There were indeed missing parentheses.

haritamar avatar Jun 25 '24 13:06 haritamar