pygsp icon indicating copy to clipboard operation
pygsp copied to clipboard

GraphWave?

Open mdeff opened this issue 3 years ago • 13 comments

From @pulquero [https://github.com/epfl-lts2/pygsp/issues/94#issuecomment-803219578]:

Btw, I have an implementation of graphwave (about 10 lines), if that would also be of interest.

mdeff avatar Mar 20 '21 01:03 mdeff

I'm not sure whether that's in scope. But IIRC they use GSP / spectral graphs concept. Right? Should be pretty close to our features.compute_spectrogram(). So it could go in that module (which would benefit from some better documentation). @nperraud what do you think?

@pulquero you could paste your implementation here.

mdeff avatar Mar 20 '21 01:03 mdeff

Ref: https://arxiv.org/abs/1710.10321

pulquero avatar Mar 20 '21 01:03 pulquero

It is essentially the characteristic function of the tig. This includes three ways to calculate it, but maybe you just want to go with the middle one, it's a reasonable balance between speed and memory.

def embedding(g, ts, nodes = None, method = None, **kwargs):
    """
    GraphWave embedding.
    ts = np.linspace(0, 100, 25)
    Returns a tensor of (nodes, filters, chi)
    """
    N = g.G.N
    if nodes is None:
        s = np.identity(N)
    else:
        s = np.zeros((N, len(nodes)))
        for i, n in enumerate(nodes):
            s[n][i] = 1.0
    tig = g.filter(s[..., np.newaxis], **kwargs)
    if s.shape[1] == 1: # single node
        tig = tig[:, np.newaxis, :]
    if tig.ndim == 2: # single filter
        tig = tig[..., np.newaxis]
    assert tig.shape == s.shape + (tig.shape[2],)

    if method is None:
        if N < 100:
            method = 'kron'
        elif N < 10000:
            method = 'partial-kron'
        else:
            method = 'loop'

    if method == 'kron':
        # fully vectorized
        tig_t_grid = np.kron(tig[..., np.newaxis], ts)
        def chi(xt):
            return np.mean(np.exp(xt*1j), axis=0)
        return chi(tig_t_grid)
    elif method == 'partial-kron':
        # more memory efficient
        def chi(xt):
            return np.mean(np.exp(xt*1j), axis=0)
        emb = np.empty((tig.shape[1], tig.shape[2], ts.shape[0]), dtype=complex)
        for i in range(tig.shape[1]):
            for j in range(tig.shape[2]):
                tig_t_grid = np.kron(tig[:,i,j, np.newaxis], ts)
                emb[i][j] = chi(tig_t_grid)
        return emb
    elif method == 'loop':
        # every byte counts
        def chi(x, t):
            return np.mean(np.exp(x*t*1j), axis=0)
        emb = np.empty((tig.shape[1], tig.shape[2], ts.shape[0]), dtype=complex)
        for i in range(tig.shape[1]):
            for j in range(tig.shape[2]):
                for k, t in enumerate(ts):
                    emb[i][j][k] = chi(tig[:,i,j], t)
        return emb

`

pulquero avatar Mar 20 '21 02:03 pulquero

Thanks!

It is essentially the characteristic function of the tig.

That's good.

Waiting on @nperraud to weight in. Then we can see how to integrate it properly.

mdeff avatar Mar 20 '21 02:03 mdeff

I'm interested to see how compute_spectrogram() compares. Any refs I can read? Looks like it should work in a similar way. Previously, I've looked at spectral graph wavelet signature (https://arxiv.org/abs/1705.06250), but this is only really suitable for equally-sized graphs - signature doesn't really transfer across arbitrary graphs. GraphWave should be more 'portable' in that sense, although it classifies individual nodes rather than graphs, but you can always take a bag-of-features approach like with the signature (essentially that is just built for the raw filter coefficients, GraphWave goes a little bit further by applying the characteristic function).

pulquero avatar Mar 20 '21 02:03 pulquero

I too believe it should be quite similar. compute_spectrogram() was developed by @nperraud. AFAIK it has only been published in his thesis (chapter 3).

20210320_035426

mdeff avatar Mar 20 '21 02:03 mdeff

The main difference between https://arxiv.org/abs/1710.10321 (at least the first version of the paper) and my thesis is that we use different way to aggregate the features. They make some histogram I believe, and I compute the L2 norm. The reason why I use the L2 norm is that I can compute the features for all the nodes very efficiently in that way. See my thesis (chapter 3) for more information. I never published this and nobody knows that I did that, also it was done in 2017. If you are interested into developing this, please do contact me privately.

nperraud avatar Mar 20 '21 10:03 nperraud

Shall we consolidate compute_spectrogram() and GraphWave in the PyGSP?

mdeff avatar Mar 20 '21 15:03 mdeff

We probably should... But this would have to come with a nice demo/tutorial: how to compute node features. I could try to do a proposition soon.

nperraud avatar Mar 20 '21 22:03 nperraud

Please also add a nodes parameter to compute_spectrogram() so it can be computed for subsets. (I use this to batch up nodes and distribute on spark).

pulquero avatar Mar 22 '21 14:03 pulquero

Btw, I believe there is a 'technical' bug in compute_norm_tig. I think the norm should be over the 'nodes' (axis=0) not the 'signal'.

pulquero avatar Mar 22 '21 15:03 pulquero

The main difference between https://arxiv.org/abs/1710.10321 (at least the first version of the paper) and my thesis is that we use different way to aggregate the features. They make some histogram I believe, and I compute the L2 norm. The reason why I use the L2 norm is that I can compute the features for all the nodes very efficiently in that way. See my thesis (chapter 3) for more information. I never published this and nobody knows that I did that, also it was done in 2017. If you are interested into developing this, please do contact me privately.

We should have written a short paper 5 years ago ...

vandergh avatar Mar 23 '21 10:03 vandergh

(Heat) kernel centrality should be added too (norm of the tig), being very similar to the other two.

pulquero avatar Apr 13 '21 10:04 pulquero