Add kw arg to normalize kernel in distance weights.
This is responding to the discussion in #790.
The default normalize=True preserves the current behavior of the Gaussian kernel weights (heavily used in spreg):
$$K(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} z^2}$$
Setting normalize=False provides an option to use unnormalized Gaussian weights:
$$K(z) = e^{-\frac{1}{2} z^2}$$
Codecov Report
:x: Patch coverage is 98.38710% with 1 line in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 85.4%. Comparing base (44ede54) to head (a5fdb75).
:warning: Report is 22 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| libpysal/kernels.py | 97.6% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #791 +/- ##
=====================================
Coverage 85.4% 85.4%
=====================================
Files 150 151 +1
Lines 15971 15984 +13
=====================================
+ Hits 13632 13649 +17
+ Misses 2339 2335 -4
| Files with missing lines | Coverage Δ | |
|---|---|---|
| libpysal/graph/_kernel.py | 79.2% <100.0%> (-3.2%) |
:arrow_down: |
| libpysal/graph/tests/test_builders.py | 100.0% <100.0%> (ø) |
|
| libpysal/graph/tests/test_kernel.py | 99.1% <100.0%> (+<0.1%) |
:arrow_up: |
| libpysal/graph/tests/test_triangulation.py | 98.8% <ø> (ø) |
|
| libpysal/weights/distance.py | 85.6% <100.0%> (+0.2%) |
:arrow_up: |
| libpysal/weights/tests/test_distance.py | 100.0% <100.0%> (ø) |
|
| libpysal/kernels.py | 97.6% <97.6%> (ø) |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Can we get it to Graph? And I suppose that it could be generalised to other than a Gaussian kernels as well.
Kernels in PySAL
This note collects issues, background, and proposals for refactoring the treatment of kernels in PySAL, especially in relation to bandwidth handling, standardization, and cross-module reuse.
Background
Conceptual Overview
In nonparametric statistics, kernels are generally used for smoothing. At a given focal point, a weighted average of surrounding observations is computed, with weights that decrease with distance from the focal point. The kernel function defines the distance decay pattern of these weights.
In spatial analysis, kernels serve multiple purposes, which can be classified into three conceptual cases:
-
Volume Standardization
Some applications require the kernel to be a proper probability density function (pdf)—i.e., it integrates to 1 over its support. We refer to this asvolume_standardized(also called normalization). -
Unnormalized Decay Weights
In other use cases, kernels define distance decay weights without requiring normalization. The kernel might be a proper pdf or not; sometimes the pdf’s normalization factor is omitted. -
Standardized Peak Value
Some use cases require the kernel’s maximum value at zero distance to equal 1, i.e., $K(0) = 1$. We refer to this aszero_standardized.
Current Implementation
Location: libpysal/weights/distance.py
Notation
- $h_i$: bandwidth at focal unit $i$
- $z_{i,j} = d_{i,j}/h_i$
- $d_{i,j}$: distance between points $i$ and $j$
Kernel Table
A diagonal boolean flag in current code sets the kernel value at $d = 0$ to 1 when True, but this does not scale the function at $d > 0$. Thus, diagonal=True implies:
-
volume_standardized = False -
zero_standardized = True
Derivations: Do Kernels Integrate to 1?
Triangular
$$ \int_{-1}^{1} (1 - |z|) , dz = 2 \int_0^1 (1 - z) , dz = 2 \left[ z - \frac{z^2}{2} \right]_0^1 = 2 \left(1 - \frac{1}{2} \right) = 1 $$
✅ Integrates to 1
Uniform
$$ \int_{-1}^{1} \frac{1}{2} , dz = \left[ \frac{1}{2} z \right]_{-1}^{1} = \frac{1}{2}(1) - \frac{1}{2}(-1) = 1 $$
✅ Integrates to 1
Quadratic (Epanechnikov)
$$ \int_{-1}^{1} \frac{3}{4}(1 - z^2) , dz = 2 \cdot \frac{3}{4} \int_0^1 (1 - z^2) , dz $$
$$ = \frac{3}{2} \left[ z - \frac{z^3}{3} \right]_0^1 = \frac{3}{2} \left(1 - \frac{1}{3} \right) = \frac{3}{2} \cdot \frac{2}{3} = 1 $$
✅ Integrates to 1
Quartic (Biweight)
$$ \int_{-1}^{1} \frac{15}{16}(1 - z^2)^2 , dz = 2 \cdot \frac{15}{16} \int_0^1 (1 - 2z^2 + z^4) , dz $$
$$ = \frac{15}{8} \left[ z - \frac{2z^3}{3} + \frac{z^5}{5} \right]_0^1 = \frac{15}{8} \left(1 - \frac{2}{3} + \frac{1}{5} \right) $$
$$ = \frac{15}{8} \cdot \frac{15 - 10 + 3}{15} = \frac{15}{8} \cdot \frac{8}{15} = 1 $$
✅ Integrates to 1
Gaussian
$$ \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-z^2/2} , dz = 1 $$
✅ Integrates to 1 (by definition of the standard normal distribution)
Why Set $K(0) = 1$?
Although not required, setting $K(0) = 1$ is often useful:
- Maximum Influence at Center: Ensures the focal point has the highest weight.
- Similarity Interpretation: Aligns kernel values with similarity measures.
- Conventional Form: Simplifies kernel expressions in some cases.
- Ease of Comparison: Facilitates comparison across kernels.
Examples:
- Triangular: $K(0) = 1$
- Epanechnikov: $K(0) = \frac{3}{4}$
- Biweight: $K(0) = \frac{15}{16}$
- Gaussian: $K(0) = \frac{1}{\sqrt{2\pi}} \approx 0.3989$
Note: Most kernels peak at 0, but their peak value varies unless explicitly standardized.
Possible Refactorings
-
New
kernel.pymodule
Consolidate kernel logic into a standalone module consumable by bothweightsandgraph. -
kw args
Support:-
volume_standardized(akanormalize) -
zero_standardized(akascale)
Ensure legacy behavior is preserved. Clarify the interpretation of kernel values across contexts.
-
-
Bandwidth Strategies
Add options for:- Fixed bandwidth: e.g., max of minimum KNN distances
- $k$ can vary (not necessarily 1)
-
K-Nearest Neighbors (KNN)
- Clarify how $k$ is computed and allow user overrides.
- $k = \lfloor n^{1/3} \rfloor$
-
~Gaussian Bandwidth Truncation~
- ~PySAL: no truncation~
- GeoDa: truncates kernel at bandwidth
- ~Suggest: expose this as an option~
Next Steps
- [ ] Draft
kernel.pyscaffolding - [ ] Add tests for both normalization and zero standardization behavior
- [ ] Evaluate current usages in
weightsandgraphfor migration - [ ] Decide on default behavior for legacy compatibility
One thing we need to figure out is handling of diagonal filling as a user typically does not know the value of $K(0)$ for any other than zero standardised kernel. That is what started my investigation into all of this. One way is exposing a keyword in the kernel builder that deals with the self weight at build time but that solves only part of the use cases. Given we, by design, do not store information on the origin of the Graph, it is hard to keep around the expected value.
100% w/ the kernel.py file. To propagate this through to the Graph objects, would it make sense to handle this using some kind of class composition? I wonder because we allow the user to apply a kernel to basically any weight type. I think this would address @martinfleis's concern?
In this pattern, when a weight is built & kernel calculated, we set the .kernel attribute to be some Kernel() class object that allows you to calculate new values and store the parameterization of the kernel itself. This would also let you set up your own custom kernels and pass them down and apply them to spatial data during construction. If no kernel is used, we set .kernel to some default Identity() kernel.
graph = Graph.build_delaunay(coordinates, kernel='parabolic')
graph.kernel(0) # .75
graph.set_self_weight(1.0) # warns if kernel is not Identity()
graph.add_self_weight() # equivalent to graph.set_self_weight(graph.kernel(0)) w/ no warning
graph.kernel.bandwidth # what is the length-scale of this kernel? (sometimes bandwidth, theta)
graph.kernel.taper # past what distance is this kernel zero? (sometimes bandwidth also)
We've decided against this .kernel class composition strategy. It maintains too much state.
No matter what we do, we need to fix this, so that when distances are calculated, we populate the self-weight with K(0).
So, should the taper/as_decay changes in sjsrey/libpysal#30 instead target libpysal/kernels.py rather than libpysal/graph/_kernels.py?
We would need to intercept arguments in order to force all kernels to taper by default within weights, but this should not be too much code.
So, should the
taper/as_decaychanges in sjsrey#30 instead targetlibpysal/kernels.pyrather thanlibpysal/graph/_kernels.py?
target libpysal/kernels.py
We would need to intercept arguments in order to force all kernels to taper by default within
weights, but this should not be too much code.
agreed
I think this test assumes we don't taper the 0 weights: https://github.com/pysal/libpysal/blob/71ace916cbc82b11228f0365b3a419808049a5cc/libpysal/graph/tests/test_kernel.py#L333 However, the default in main is to taper: https://github.com/pysal/libpysal/blob/71ace916cbc82b11228f0365b3a419808049a5cc/libpysal/graph/_kernel.py#L86
Not sure why the tests are passing in main?
@jGaboardi do we want to add pyproj to the deps in in ci/313-min to address these failing tests?
No, min should have the minimal set of deps to test our ability to work only with required deps. We should conditionally skip if pyproj is not available.