Kalman-and-Bayesian-Filters-in-Python Error in use of scale in Chapter 2: Noisy Sensors

First, the information about how noisy the sensor is:

Say we get a reading of door, and suppose that testing shows that the sensor is 3 times more likely to be right than wrong. We should scale the probability distribution by 3 where there is a door. If we do that the result will no longer be a probability distribution, but we will learn how to fix that in a moment.

The problem is that how this information is applied. From the example code with normalization further down

from filterpy.discrete_bayes import normalize

def scaled_update(hall, belief, z, z_prob): 
    scale = z_prob / (1. - z_prob)
    belief[hall==z] *= scale
    normalize(belief)

belief = np.array([0.1] * 10)
scaled_update(hallway, belief, z=1, z_prob=.75)

print('sum =', sum(belief))
print('probability of door =', belief[0])
print('probability of wall =', belief[2])
book_plots.bar_plot(belief, ylim=(0, .3))

which prints

sum = 1.0
probability of door = 0.1875
probability of wall = 0.06249999999999999

The problem here is that the example has managed to make the posterior value at being at any single door 3 times greater than at any single wall. But this is not the same as saying that the measurement has a 0.75 chance of being correct (being at a door) and a 0.25 chance of being incorrect (being at a wall) unless the number of doors and walls are equal to each other in the problem.

The sum of all probabilities for being at a door should be 3 times the sum of all probabilities of being at a wall. In this case, that would mean that the probability of being at any particular door should be 0.25 and the probability of being at any particular wall should be 0.0357 since there are three doors (total probability of being at any of them is 0.75) and 7 walls (probability of being at any of them is 0.25).

It is a bit easier to see in the absurd case of having one door and an infinite number of walls. If the measurement says the dog is at a door, then the probability of being at that one door is 0.75 and the probability of being at any single wall is a positive infinitesimal (there are an infinite number of walls after all) but the sum of their probabilities is 0.25. But the example calculation in this chapter would calculate that probability of being at the door would go towards zero and eventually underflow to zero (due to floating point limitations) as the number of walls increases towards infinity (assuming the computer has enough RAM and swap to have a big enough belief array for underflow to be reached).

Oct 02 '20 09:10 frejanordsiek

Hi, I'm not sure I follow. Let's stick with infinite walls and assume infinite precision on the cpu floating point. And let's say there is only one door.

So, if I'm standing in front of a door there is a 75% chance of the sensor saying door, 25% of saying hallway.

On the other hand, if I am actually in front of a wall , there is a 25% chance of it saying door. Which will give me an infinite number of false wall detections, vs 3/4 of a true wall detection. Hence, if the sensor says 'door', it is only infinitesimally likely that I am in fact in front of the door, because I am being swamped by false positives.

It's the old "if a medical test is 99% accurate, and it says you have the disease, what are the chances you have the disease" problem. It's not 99%. If the disease is extremely rare (1/billion), and run it over the population of the earth, you get around 6 correct positive results, but 10 million incorrect positive results. No need to update your will just yet!

In bayes

p(door|z) = p(z|door)p(door)  / 
                  [p(z|door) *p(door) + p(z|wall)*p(door)]

We can see already that p(door) is ~ 0, so p(door|z) ~0 due to the multiplication in the numerator, and so a value of ~33% (which I think you are suggesting if there are 3 doors) must be wrong.

Am I missing your point?

Oct 13 '20 20:10 rlabbe

Hm, you know, based on your response, I think I got it wrong.

Let's say there are N_W walls and N_D doors. The states for where one is at is W for wall and D for door (upper case letters). The sensor measures w for wall and d for door. It is correct 75% of the time and incorrect 25% of the time meaning that

p(w|W) = 0.75
p(d|W) = 0.25
p(w|D) = 0.25
p(d|D) = 0.75

The probability assuming no prior information of being in front of a door or wall is

p(W) = N_W / (N_D + N_W)
p(D) = N_D / (N_D + N_W)

So, then, using Bayes theorem,

p(D|d) = p(d|D) p(D) / p(d)
       = p(d|D) p(D) / [p(d|D) p(D) + p(d|W) p(W)]
       = [N_D / (N_D + N_W)] p(d|D) / {[N_D / (N_D + N_W)] p(d|D) + [N_W / (N_D + N_W)] p(d|W)}
       = N_D p(d|D) / [N_D p(d|D) + N_W p(d|W)]

In the limit that N_W / N_D goes to infinity, p(D|d) goes to 0 and therefore p(W|d) goes to 1.

p(D|w) = p(w|D) p(D) / p(w)
       = p(w|D) p(D) / [p(w|D) p(D) + p(w|W) p(W)]
       = [N_D / (N_D + N_W)] p(w|D) / {[N_D / (N_D + N_W)] p(w|D) + [N_W / (N_D + N_W)] p(w|W)}
       = N_D p(w|D) / [N_D p(w|D) + N_W p(w|W)]

In the limit that N_W / N_D goes to infinity, p(D|w) also goes to 0 and therefor p(W|w) goes to 1. But, we can still look at the ratio of probabilities for being at a D depending on whether the sensor says d or w.

p(D|d) / p(D|w) = {N_D p(d|D) / [N_D p(d|D) + N_W p(d|W)]} / {N_D p(w|D) / [N_D p(w|D) + N_W p(w|W)]}
                = {p(d|D) [N_D p(w|D) + N_W p(w|W)]} / {p(w|D) [N_D p(d|D) + N_W p(d|W)]}

In the limit that N_W / N_D goes to infinity, this ratio is finite and is

p(D|d) / p(D|w) -> {p(d|D) p(w|W)} / {p(w|D) p(d|W)}
                   = 0.75 * 0.75 / (0.25 * 0.25)
                   = 9

And the other ratio, in the limit of N_W / N_D going to infinity, is

p(W|d) / p(W|w) -> 1

Oct 13 '20 21:10 frejanordsiek

@rlabbe in your response just above you say "So, if I'm standing in front of a door there is a 75% chance of the sensor saying door, 25% of saying hallway." Let's call this variant (A).

This is the bit that really confused me, because that's actually not what you say in the main text: "the sensor is 3 times more likely to be right than wrong". Let's call this variant (B).

Sticking with the notation of @frejanordsiek ($d$ and $w$ refer to measuring door and wall respectively, and $D$, $W$ refer to there actually being a door or wall respectively) ):

(A) can be mathematically formulated as $P(d|D) = 0.75$, $P(w|D) = 0.25$, $P(w|W) = 0.75$, $P(d|W) = 0.25$
But variant (B) is actually saying $P(D|d) = 0.75$, $P(W|d) = 0.25$, $P(W|w) = 0.75$, $P(D|w) = 0.25$

I'm assuming variant A is the correct one based on the code. Right?

Why does this matter? Because let's say we pick position 0. We can use Bayes to find $P(G_0 | d)$ where $G_0$ indicates that the dog is at position 0. Let's use variant (A).

$$ \begin{align} P(G_0|d) &= \frac{P(d|G_0) \cdot P(G_0)}{Z} \space\space\space\space \text{(where Z is the partition function)} \ &= \frac{P(d|D) \cdot P(G_0)}{Z} \space\space\space\space \text{(using the knowledge that position 0 has a door)} \ &= \frac{0.75 \cdot 0.1}{Z} \space\space\space\space \text{(plugging in our values)} \ \end{align} $$

We can also calculate something similar for position 2:

$$ \begin{align} P(G_2|d) &= \frac{P(d|G_2) \cdot P(G_2)}{Z} \space\space\space\space \text{(where Z is the partition function)} \ &= \frac{P(d|W) \cdot P(G_2)}{Z} \space\space\space\space \text{(using the knowledge that position 2 has a wall)} \ &= \frac{0.25 \cdot 0.1}{Z} \space\space\space\space \text{(plugging in our values)} \ \end{align} $$

So this way we arrive at the conclusion that the update should be such that places where there are doors are 3x larger than places where there aren't (because $P(G_0|d) / P(G_2|d) = 3$, thereby validating the "scale" approach you take in the code.

BUT if you try to use variant B to do similar working, you end up getting stuck:

$$ \begin{align} P(G_0|d) &= \frac{P(d|G_0) \cdot P(G_0)}{Z} \space\space\space\space \text{(where Z is the partition function)} \ &= \frac{P(d|D) \cdot P(G_0)}{Z} \space\space\space\space \text{(using the knowledge that position 0 has a door)} \ \end{align} $$

What is $P(d|D)$?

EDIT Looking at chapter 3 on Gaussians you wrote: "In filtering problems computing $p(x\mid z)$ is nearly impossible, but computing $p(z\mid x)$ is straightforward. Bayes' lets us compute the former from the latter". So now it's quite clear to me that variant A is the correct one.

Jun 09 '22 15:06 alexander-soare

Thanks for this discussion! I was confused with that phrase ("the sensor is 3 times more likely to be right than wrong") as well but saw that it's clarified a little later in the same chapter. By combining the clarification with the original paragraph, I'm now able to understand it much better:

Aug 18 '22 05:08 rohitgeo

Kalman-and-Bayesian-Filters-in-Python Kalman-and-Bayesian-Filters-in-Python copied to clipboard

Error in use of scale in Chapter 2: Noisy Sensors

Kalman-and-Bayesian-Filters-in-Python
Kalman-and-Bayesian-Filters-in-Python copied to clipboard