power-grid-model icon indicating copy to clipboard operation
power-grid-model copied to clipboard

[FEATURE] *Validity Check and Bad Data Handling in State Estimation*

Open scud-soptim opened this issue 1 year ago • 6 comments

The new feature should introduce a function to calculate the expected value $E$ and the actual value of the minimization function $J(x)$. If $J(x)$ lies within the 3-sigma band, the result of the state estimation is considered valid. Additionally, bad data should not be included in the determination of $J(x)$, or should only be considered with an adjusted weight in the calculation of $J(x)$.

Detailed Description:

  1. Calculation of Expected Value $E$ and Variance of $E$:

    • The function should calculate the expected value $E$ as $E = m - n$, where:
      • $m$ is the number of measurements.
      • $n$ is the number of state variables ( = 2 * Number of Busses - Slack )
    • The variance $\sigma_J^2$ should be calculated as $\sigma_J^2 = 2(m - n)$. so: $\sigma_J = sqrt(2*(m-n))$
  2. Calculation of Minimization Function $J(x)$:

    • The function should compute ( J(x) ) as the sum of the weighted squared residuals:

      $J(x) = \sum_{i=1}^{m} \left( \frac{r_i}{\sigma_i} \right)^2$

    • The residual $r_i$ is defined as $r_i = z_i - h_i(x)$, where $z_i$ is the measurement and $h_i(x)$ is the predicted value based on the state $x$.

  3. Validity Check within the 3-Sigma Band:

    • Check if $J(x)$ lies within the 3-sigma band: $E - 3\sigma_J < J(x) < E + 3\sigma_J$.
    • If $J(x)$ lies within this range, the result of the state estimation is considered valid.
  4. Handling Bad Data:

    • Bad data can be detected by evaluating the residual $r_i$. If $|r_i|$ exceeds a threshold, the data is considered bad. The threshold can be defined as follows: $|r_i| > \alpha \cdot \sigma_i $, where $\alpha$ is typically in the range of 3 to 5.

    • Bad data should be excluded from the calculation of $J(x)$ or included with an adjusted weight $w$. The adjusted weight can be calculated as:

      $w_i^{\text{new}} = \frac{w_i}{1 + \alpha \cdot \frac{r_i^2}{\sigma_i^2}}$

  • $w_i = \frac{1}{\sigma_i^2}$

scud-soptim avatar Jun 14 '24 11:06 scud-soptim

see also #377

scud-soptim avatar Jun 15 '24 14:06 scud-soptim

Hi @sudo-ac,

This is an interesting proposal. If I understand correctly, this is the normalized residual test, right?

Your expected value $E$ is the number of over-determined measurements in the system.

TonyXiang8787 avatar Jun 27 '24 19:06 TonyXiang8787

Thank you for your interest. Yes, you are correct, this is the normalized residual test. The expected value, as mentioned, is indeed the number of over-determined measurements in the system. This approach helps in validating the state estimation by ensuring that the computed residuals fall within the acceptable range defined by the 3-sigma band. Additionally, handling bad data appropriately will enhance the accuracy and reliability of the state estimation.

scud-soptim avatar Jul 03 '24 11:07 scud-soptim

Dear @sudo-ac,

Thanks for your input. We are considering implementing this in the following order:

  1. Try your method in some real-world state estimation study cases in research scripts. Evaluate the adjust the mathematics.
  2. Implement this bad data detection in the Python side of the PGM.
  3. Move the implementation to C++ core.

I have two questions for you still.

Formula

Can you have a look again on the formulas you proposed? Some of them do not make completely sense. For example, $$E - 3 \sigma_J$$ is always negative. So the left comparison $$E - 3 \sigma_J < J(x) $$ always holds? Also, when you calculate the $$ J(x)$$, do you actually need to put a square root?

Please have a check on the formulas and make the adjustment if needed.

Numerical example

Can you give us some numerical examples to show how your algorithm will work?

TonyXiang8787 avatar Jul 23 '24 07:07 TonyXiang8787

Yes, but please give me some time to examine an example.

scud-soptim avatar Jul 23 '24 15:07 scud-soptim

Now I see your problem. Here is an example:

First of all, there was a mistake in the calculation of the variance. The variance should be calculated as follows: $\sigma^2 = 2 \times (m - n) $

Bus 1-3: U = 12.5 kV

    1*                2 
G1->|--x-----L_12--x-|<-G2
    x               x
     .             .
       L_13    L_24
         .      .
          x___x->L1
            3

Bus 1 = Slack x: Powerflow - Measurement P, Q

3 Nodes, 3 Branches

Measurements $m$:

3*Voltage (Bus1 - Bus3) + 6 * 2 (P, Q) + 3 * 2 (P, Q-Measurement for Generation at Bus 1 and 2, Load at Bus 3) = 21 $m = 3 + 12 + 6 = 21$

State Variables $n$:

$n = 2 \times \text{Nodes} - \text{Slack} = 2 \times 3 - 1 = 5$

Expected Value $E$:

$E = m - n = 21 - 5 = 16$

Variance $\sigma^2$:

$\sigma^2 = 2 \times (m - n) = 32$ $\sigma = \sqrt{32} \approx 5.66$

Confidence Interval: $3 \times \sigma = 3 \times 5.66 = 16.98$ -> This results in a negative value when subtracted from the expected value:

$16 - 16.98 = -0.98$

Therefore, one must use the chi-squared distribution.

from scipy.stats import chi2

# Degrees of Freedom
df = 16

# Confidence Level 99.8% (3 sigma band)
alpha = 0.002  # 1 - 0.998

# Calculation of critical values
critical_value_lower = chi2.ppf(alpha / 2, df)
critical_value_upper = chi2.ppf(1 - alpha / 2, df)

print(f"Lower critical value: {critical_value_lower}")
print(f"Upper critical value: {critical_value_upper}")

gives:

Lower critical value: 3.9416278434807315
Upper critical value: 39.252354790768464

$J(x) = \sum_{i=1}^{m} \left( \frac{r_i}{\sigma_i} \right)^2$

should be in the interval [3.94, 39.25].

Conclusion: The given equation for the variance is only valid if the number of degrees of freedom $E$ is greater than 30 ( the chi-squared distribution approximates a normal distribution if $E > 30$, see E. Handschin "Elektrische Energie Übertragungssystem" Page 175ff). This condition is typically met for practical grids. For smaller systems, the chi-squared distribution must be used.

scud-soptim avatar Jul 23 '24 17:07 scud-soptim