libpysal icon indicating copy to clipboard operation
libpysal copied to clipboard

Interest in a mutual knn?

Open ljwolf opened this issue 2 years ago • 8 comments

For a different causal inference project, I'm writing a few spatial/feature matching algorithms.

I think we may want to offer a Mutual_KNN() constructor in Graph, and also bring a Symmetric_KNN()? or have symmetric/mutual options in a knn constructor?

This is like the current .symmetrize() function, but instead of adding edges to the KNN graph to induce symmetry, it removes edges to the KNN graph who are not mutually k-near.

This could also be implemented as a separate function for arbitrary graphs after weighting/kerneling, since it's just based on the edge set:

def mutual_knn(x, k=5):
    graph = KNN.from_array(x, k=k)
    directed_edges_array = graph.sparse != graph.sparse.T
    removed = (graph.sparse - directed_edges_array) > 0
    removed.eliminate_zeros()
    return WSP2W(WSP(removed))

ljwolf avatar Dec 08 '23 11:12 ljwolf

What about a keyword in symmetrize controlling if the symmetrization happens using addition or removal of asymmetric edges? It can be then exposed in the KNN builder via a keyword (feels better than another builder).

martinfleis avatar Dec 08 '23 11:12 martinfleis

That's a good idea imho!

ljwolf avatar Dec 08 '23 13:12 ljwolf

😆 i have an example in my new graph materials Screenshot 2023-12-08 at 8 34 22 AM

(note we still dont have symmetrize in the Graph yet)

knaaptime avatar Dec 08 '23 16:12 knaaptime

We probably want to think about the API, then... I would prefer something like the following, implemented in the utils.py module, amenable for any weight type:

def make_symmetric(w, drop=False):
    if drop:
        # keep only mutual neighbors
        ...
    else:
        # add all reversed links to edge set
        ...

Also, people often do (W+W.T)/2 to symmetrize weight values (in either inclusive or exclusive cases). This is not possible to induce via a standardisation trick, and idk how we might want to implement that...?

ljwolf avatar Dec 08 '23 17:12 ljwolf

Are we talking about implementing it in weights or graph? If the latter, then I'd like to have it as a method.

martinfleis avatar Dec 08 '23 17:12 martinfleis

definitely graph. I'd like to avoid implementing things in weights.

ljwolf avatar Dec 08 '23 18:12 ljwolf

@knaaptime a complete out of topic but since I've noticed it in your code above. Getting focals out of the Graph via neighbors is going to be super inefficient. Especially compared to .unique_ids which gives you nearly the same thing. g_knn10.unique_ids.to_series() gives you the same output but on a graph I have loaded in memory right now with 13890804 edges, the neighbors path takes 20.1s while unique_ids 186ms.

martinfleis avatar Dec 08 '23 22:12 martinfleis

lol i noticed that right after i posted that photo and updated it to unique ids. thanks for the close eye!

knaaptime avatar Dec 08 '23 22:12 knaaptime