libpysal Expectations in kernel weights

As I am playing with gwlearn and kernels over there, compared to mgwr, I noticed that kernels in libpysal behave unexpectedly (besides an actual bug in #789). What I would assume that we use the kernel function, say Gaussian, to assign weights from 1 at distance 0 progressively to 0. But that is not the case. The progression of a Gaussian kernel goes from 0.4 as shown in the fig below (bandwidth 100). The same happens in GeoDa.

However, if I use the kernel as defined in mgwr, it behaves as I expected.

def _gaussian_mgwr(distances, bandwidth):
    zs = distances / bandwidth 
    return numpy.exp(-0.5 * ((zs)**2))

The reason I don't understand why we should not start at 1 is that if I get a kernel-based W, and want to assign self weight afterwards, I am pretty much without options of doing that correctly. The value for 0 is 0.3989422804014327. Though I would intuitively expect it to be 1.

Similar behaviour, albeit not that drastic, is in a few of the other kernels.

The formula for Gaussian we use is this:

def _gaussian(distances, bandwidth):
    u = distances / bandwidth
    return numpy.exp(-((u / 2) ** 2)) / (numpy.sqrt(2 * numpy.pi))

Why is it divided by (numpy.sqrt(2 * numpy.pi))? If we don't then it matches MGWR and also the Gaussian kernel definition in support vector machine and elsewhere.

Can someone shed some light on this? It follows the formula I've found in @sjsrey's Modern spatial econometrics in practice but have no clue where that comes from and why it looks like this.

Jun 02 '25 15:06 martinfleis

Just making some notes here, primarily for myself to wrap my head around this.

Over in https://github.com/uscuni/gwlearn/issues/3#issuecomment-2920584059 @TaylorOshan mentions that we need to take a square root of weights before passing onto sklearn. Which applies to the formula in MGWR. If we do sqrt of that, we get exactly the same curve as we would get in libpysal if we excluded the division.

def _libpysal_no_division(distances, bandwidth):
    u = distances / bandwidth
    return numpy.exp(-((u / 2) ** 2))

def _mgwr(distances, bandwidth):
    zs = distances / bandwidth 
    return numpy.exp(-0.5 * ((zs)**2))

x = numpy.linspace(10, 500)
plt.plot(x, _libpysal_no_division(x, 100), label="lib_no_division")
plt.plot(x, numpy.sqrt(_mgwr(x, 100)), label="sqrt of mgwr", linestyle="--")
plt.plot(x, _mgwr(x, 100), label="mgwr")
plt.legend()

Jun 02 '25 18:06 martinfleis

The formula for Gaussian we use is this:

def _gaussian(distances, bandwidth): u = distances / bandwidth return numpy.exp(-((u / 2) ** 2)) / (numpy.sqrt(2 * numpy.pi)) Why is it divided by (numpy.sqrt(2 * numpy.pi))? If we don't then it matches MGWR and also the Gaussian kernel definition in support vector machine and elsewhere.

Can someone shed some light on this? It follows the formula I've found in @sjsrey's Modern spatial econometrics in practice but have no clue where that comes from and why it looks like this.

The value for 0 is 0.3989422804014327 because that is the value of the gaussian kernel at $z=0$: with $K(z) = \frac{1}{\sqrt{2\pi}} e^{\frac{-z^2}{2}}$. In other words $0.3989 \approx \frac{1}{\sqrt{2\pi}}$. This is a normalized kernel, whereas omitting the division by numpy.sqrt(2*numpy.pi) gives an unnormalized kernel.

Jun 03 '25 15:06 sjsrey

I think I got that far in my understanding. What I don't get is why would I want the normalised option.

Shall we maybe add a keyword to build_kernel controlling this normalisation?

Jun 03 '25 16:06 martinfleis

There are different ways of normalizing kernels. Some normalize the kernel values so that K(0)=1. Others normalize the kernel so that integral(K(X))=1 over the domain of the kernel. Sometimes, we call the latter "proper" kernels in Bayesian computation, since they're "proper" probability distributions. See, for instance, this list.. If we were opinionated, we'd standardize the kernels so that the integral is always 1, or that K(0)=1.

Jun 03 '25 16:06 ljwolf

I should say—we're not opinionated, so we have kernels with varying K(0)=1 and integral(K(X)) = 1.

Jun 03 '25 16:06 ljwolf

Adding a keyword arg would increase the clarity and allow for different use cases (i.e., kernels as proper pdfs, versus unnormalized kernels).

Jun 03 '25 16:06 sjsrey

Agreed to schedule a sprint on this.

Jun 12 '25 15:06 martinfleis

Relevant discussion in https://github.com/pysal/mgwr/issues/157

Jun 12 '25 15:06 martinfleis