libpysal Kernel docstring does not mention unique Gaussian kernel behavior

I keep getting bit by this.

Our Gaussian kernel only computes for observations within the bandwidth distance.

But, in theory, this isn't necessary, since observations are still connected in the Gaussian kernel past this bandwidth.

Thus, in quite a few cases, this distance can result in truncations at pretty high w_{ij}; I get truncation at around .25 in an adaptive bandwidth on berlin neighborhoods data from geopython...

Since this isn't going to be fixed (I recall @TaylorOshan running into this when trying to build GWR on top of existing PySAL stuff), we need to disclaim that we force all kernels to be truncated at the bandwidth.

Jun 05 '18 23:06 ljwolf

Yep, IIRC I learned this was a feature to avoid memory bottlenecks for when there are many points. For GWR, I ended up making a custom kernel class based on this class that can optionally truncate.

On Tue, Jun 5, 2018 at 7:05 PM Levi John Wolf [email protected] wrote:

I keep getting bit by this.

Our Gaussian kernel only computes for observations within the bandwidth distance.

But, in theory, this isn't necessary, since observations are still connected in the Gaussian kernel past this bandwidth.

Thus, in quite a few cases, this distance can result in truncations at pretty high w_{ij}; I get truncation at around .25 in an adaptive bandwidth on berlin neighborhoods data from geopython https://github.com/ljwolf/geopython...

Since this isn't going to be fixed (I recall @TaylorOshan https://github.com/TaylorOshan running into this when trying to build GWR on top of existing PySAL stuff, we need to disclaim that we force all kernels to be truncated at the bandwidth.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pysal/libpysal/issues/47, or mute the thread https://github.com/notifications/unsubscribe-auth/AHvdzT0GttT8Bb__gLBCIh9I5JQtvCZEks5t5w7CgaJpZM4UbvFO .

Jun 06 '18 00:06 TaylorOshan

We could revisit this as the reason for truncation may be less compelling now than when we initially implemented this.

Jun 06 '18 00:06 sjsrey

Stale issue message

Mar 01 '20 00:03 github-actions[bot]

Reopening this as having this may be useful for gwlearn. We should still clip at some point but maybe not at bandwidth but further away.

Jun 02 '25 15:06 martinfleis

We could resolve this by adding clip keyword to build_kernel, defaulting to clip=True and clipping by the bandwidth but also allowing clip=False, which would essentially result in a full dense $n * n$ matrix, or clip=float, which would fetch geometries within clip distance and apply kernel with a set bandwidth. You are just risking the Graph being way too dense and memory heavy but for many applications, that could be just fine and in gwlearn, we could either do False for small datasets or clip by something like 3*bandwidth (3 std) for larger. Or let a user specify that.

Jun 02 '25 18:06 martinfleis

I like this approach. On the one hand, the memory issue is probably less of a problem for previous cases, but may still be an issue for newer, or future cases on larger data sets.

Would also be great because it would allow us to consume this downstream in MGWR.

On Mon, Jun 2, 2025 at 2:10 PM Martin Fleischmann @.***> wrote:

martinfleis left a comment (pysal/libpysal#47) https://github.com/pysal/libpysal/issues/47#issuecomment-2931875379

We could resolve this by adding clip keyword to build_kernel, defaulting to clip=True and clipping by the bandwidth but also allowing clip=False, which would essentially result in a full dense $n * n$ matrix, or clip=float, which would fetch geometries within clip distance and apply kernel with a set bandwidth. You are just risking the Graph being way too dense and memory heavy but for many applications, that could be just fine and in gwlearn, we could either do False for small datasets or clip by something like 3*bandwidth (3 std) for larger. Or let a user specify that.

— Reply to this email directly, view it on GitHub https://github.com/pysal/libpysal/issues/47#issuecomment-2931875379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB553TKL5R6AVRXUGGSAU2T3BSHTBAVCNFSM6AAAAAB6NIOV6OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMZRHA3TKMZXHE . You are receiving this because you were mentioned.Message ID: @.***>

Jun 03 '25 16:06 TaylorOshan

I'm confused... we already have this functionality in Graph: A "distance-clipped" Gaussian kernel is a distance band, Graph.build_distance_band(coordinates, my_threshold, kernel='gaussian', bandwidth=my_bandwidth). A "knn-clipped" Triangular kernel is Graph.build_kernel(coordinates, k=my_threshold, kernel='triangular'). To make things performant, you basically must use kdtree.query_ball_point()/kdtree.sparse_distance_matrix() (Graph.build_distance_band()) or kdtree.query() (Graph.build_knn()) to calculate tapered kernels effectively.

If we want to support a second distance-based tapering of un-tapered kernels (right now only gaussian, exponential) through Graph.build_kernel(), we should extend the existing taper argument. I think using the taper argument makes more sense than a new clip argument because the typical terminology in stat learning for this approach is called "covariance tapering" or "kernel tapering". The taper argument as it works now means that taper=True would be the same as taper=my_bandwidth. We're just allowing the user to set the tapering threshold to an arbitrary value now.

For example,

Graph.build_kernel(coordinates, bandwidth=1, taper=2, kernel='exponential', k=None)

would use the same codepath as:

Graph.build_distance_band(coordinates, threshold=2, kernel='exponential', bandwidth=1)

This could be done by using _distance_band() & taper similarly to how we use _knn() & k within _kernel(): if taper is not False, we dispatch to _distance_band() to calculate the distance matrix, setting taper=bandwidth if taper is True. Then, we proceed. The codepath using pdist() would remain the same for when taper is False.

Also consider: some kernels are tapered by distance by definition (see the numpy.clip() steps in Graph._kernel), while others (exponential, gaussian) can be tapered/not tapered using either distance- or KNN-tapering. This means we'd also need to handle the case where someone specifies both KNN- and distance-tapering together:

# this kernel looks like a house: 
#   /\
#  |  |
#--    --
Graph.build_kernel(coordinates, kernel='triangular', k=5, bandwidth=1, taper=.5)

If both taper and k are provided, we can do the knn search and then filter the resulting distances by taper.

Jun 04 '25 08:06 ljwolf

Ah, that is true. Though It is clearly not a straightforward solution :). I like the idea with the taper keyword to make it a bit easier to think about these things. Also note that we currently don't expose taper in Graph.build_kernel but that is easy to change.

If both taper and k are provided, we can do the knn search and then filter the resulting distances by tape

sounds good to me

Jun 04 '25 09:06 martinfleis