This allows for precomputed distance matrices in the skater baseclass, skater.SpanningForest. For a usage example:

import numpy
from libpysal import weights
from scipy.spatial import distance_matrix
from spopt.region.skater import SpanningForest

r = numpy.random.normal(size=(10,2))
D = distance_matrix(r,r, p=1) #l1 metric
w = weights.lat2W(5,2) # 5 by 2 lattice

SpanningForest(dissimilarity='precomputed').fit(5, w, D)

Now, caveat emptor: the semantics are a little different here @Shruti-Patil, since this converts the score into minimizing the sum of dissimilarities within the clusters, rather than minimizing the distance between features and the feature centroid of the cluster.

Aug 12 '21 20:08 ljwolf

If you'd like to try it on your data, use pip install git+https://github.com/ljwolf/spopt and follow the example above.

Aug 12 '21 20:08 ljwolf

Codecov Report

Merging #188 (3aca9a9) into main (42520cc) will decrease coverage by 0.4%. The diff coverage is 33.3%.

@@           Coverage Diff           @@
##            main    #188     +/-   ##
=======================================
- Coverage   64.5%   64.1%   -0.4%     
=======================================
  Files         17      17             
  Lines       1771    1785     +14     
  Branches     343     350      +7     
=======================================
+ Hits        1143    1145      +2     
- Misses       576     583      +7     
- Partials      52      57      +5

Impacted Files	Coverage Δ
spopt/region/skater.py	`76.4% <33.3%> (-5.9%)`	:arrow_down:

Aug 12 '21 20:08 codecov[bot]

Should we add a test for this or is it good without?

Aug 13 '21 00:08 jGaboardi

solution for #187

Aug 13 '21 01:08 jGaboardi

I am not sure I want to merge this unless more people beyond @Shruti-Patil find it useful.... In abstract, it seems like a good idea, and I'm all for user power. But, empirically, I haven't seen good performance when minimizing the pre-computed dissimilarities.

I suppose the trick is that

for the current implementation, decisions directly minimize the deviation of the data relative to its center (either median, mean, or any other user-supplied reduction).
In the precomputed case implemented here, we can only minimize the dissimilarity within the cluster. Elsewhere, there's no clear direction on whether this should be the total feature dissimilarity matrix, or the dissimilarity matrix after filtering by the possible joins (such that we're only considering the dissimilarity of "connected" observations).

So, without further empirical work (on our end) to verify which is the right score, I don't want this to land.

Aug 17 '21 13:08 ljwolf

without further empirical work (on our end) to verify which is the right score, I don't want this to land.

This is reasonable.

Aug 17 '21 13:08 jGaboardi

@ljwolf Shall we go ahead and close this out as stale?

Oct 22 '22 02:10 jGaboardi

The OP went in another direction for a solution, so let's close.

Oct 24 '22 16:10 jGaboardi

spopt
spopt copied to clipboard

[WIP] allow for precomputed distance matrices

Codecov Report

spopt spopt copied to clipboard

[WIP] allow for precomputed distance matrices

Codecov Report

spopt
spopt copied to clipboard