raytracing.github.io
raytracing.github.io copied to clipboard
Book 3.3.7: Magic transformation
In listing 13 we transform our code to compute integral of $x^2$ using explicit inverse CDF ($f$) and PDF ($p$) functions.
(Note: Using $f$ to denote $CDF^{-1}$ is very unfortunate, since we also use $f$ to denote arbitrary functions in several places in the book.)
After the listing there are a few "controversial" statements, which are glossed over and are not explained:
-
"We were previously multiplying the average over the interval (
sum / N) times the length of the interval (b - a) to arrive at the final answer. Here, we don't need to multiply by the interval length—that is, we no longer need to multiply the average by 2."What'd just happen? One doesn't simply decide that "we don't need to multiply by the interval length" and magically arrive at the right result. There must be some logic (math) behind it, which is glossed over.
-
"To remove this bias, we need to down-weight where we sample more frequently, and to up-weight where we sample less frequently. ... This is why we divide
x*xbypdf(x)."I can intuitively understand that since we are sampling more frequently in the areas of high PDF, we want to down-weight those samples to avoid the bias (and vice versa). But, how/why did we decide that dividing by PDF is just the right amount?
My gut feeling on these two points is this:
Since, pdf(x) yields values less than 1, we are actually up-weighting all samples (with less frequently sampled areas being up-weighted more than the more frequently sampled ones). And, the total mount of of up-weighting ends up being equal to (b - a).
And so, dividing all samples by pdf(x) ends up having the same effect as multiplying by (b - a) in the earlier version of the code.
I was able to tie in points 1 and 2 above. Assuming point 2 is correct, we can rewrite our formula for estimating area of function $f(x)$ with PDF $p(x)$ as follows:
$$ area(f(x), a, b) \approx \frac 1 N \sum_{i=0}^{N-1} \frac {f(x_i)} {p(x_i)} \enspace \enspace \enspace (1) $$
How does this tie in with the earlier formula from chapter 3.1?
$$ area(f(x), a, b) \approx \frac{b - a}{N} \sum_{i=0}^{N-1} f(x_i) \enspace \enspace \enspace (2) $$
Well, in chapter 3.1 we've used an implicit constant PDF $p(x) = C$. We know that integral of PDF must be equal 1, and if we integrate $p(x) = C$ over the interval $[a, b]$, we will find that:
$$ p(x) = \frac 1 {b - a} $$
Now, if we re-arrange the formula (2) above we get:
$$ area(f(x), a, b) \approx \frac 1 N \sum_{i=0}^{N-1} \frac {f(x_i)} {(\frac 1 {b - a})} = \frac 1 N \sum_{i=0}^{N-1} \frac {f(x_i)} {p(x_i)} $$
which is the same as formula (1).
QED
The only open question remains is, what magic did we use to come up with formula (1)? In other words, why is up/down-weighting by the PDF yields the correct result?
Found this wikipedia artice—Monte Carlo integration—which confirms that formula (1) above is the right way of integrating any arbitrary function $f$ with PDF $p$ using importance sampling.
One important note: to get accurate results, samples $x_i$ have to be picked using the PDF $p$ and not from a uniform distribution. I believe this was made abundantly clear in chapters 3.5 and 3.6.