cnn_graph why the Chebyshev polynomials is needed?

why the Chebyshev polynomials is needed?

Open zeal-up opened this issue 5 years ago • 16 comments

When approximate the initial filters kernel gθ(Λ) = diag(θ) with a polynomial why we can‘t simply combine the eigenvector and eigenvalue matrix together like because * The paper instead, use Chebyshev polynomials and also said that I'm quite confused here. I don't know why the Chebyshev is needed?

Aug 23 '18 11:08 zeal-up

+1, I was doing GCN paper review and read this paper today, came up with the same question. We can compute Lx, L^2x, ..., L^Kx sequentially using O(K\epsilon) operations, and linearly combine them, same computational complexity.

May 22 '19 22:05 ZiqianXie

I think in the cited wavelet on graph paper (Hammond et al.), they used chebyshev polynomial to approximate a known wavelet function because chebyshev interpolation has certain desired property, in the case where the filters are unknown, the chebyshev polynomials are not necessary, any linear combination of chebyshev polynomials up to Kth degree can be represented by a Kth degree polynomial.

May 23 '19 02:05 ZiqianXie

Has anyone tried node classification on a general graph without chebyshev polynomials (i.e. using only the laplacian and its powers)? What was the accuracy compared to the chebyshev method?

Mar 26 '20 14:03 u42dani

See here: https://www.youtube.com/watch?v=v3jZRkvIOIM. min 26.

Apr 15 '20 14:04 Keramatfar

Has anyone tried node classification on a general graph without chebyshev polynomials (i.e. using only the laplacian and its powers)? What was the accuracy compared to the chebyshev method?

I have used the laplacian, and the normalized laplacian to train a convolution NN. The accuracy on the training, testing, and validation set were all above 85%.

The data I used was proxy related data and the model was for a security related to problem. However, doing that is how I found this work, and xaviers work as a whole... so I’ve seen it work, but Xavier Bresson and his team build on those concepts so well....

So, not the best modeling approach, but it is tractable for certain problems.

Apr 15 '20 15:04 Jovonni

See here: https://www.youtube.com/watch?v=v3jZRkvIOIM. min 26.

Yes, it is stable under coefficient perturbation but otherwise the complexities are the same.

Apr 15 '20 15:04 ZiqianXie

Any polynomial (monomials, Chebyshev, Legendre, etc.) of order K has the same representation power and can be used. We used Chebyshev because we had experience with it and thought it would be easier to optimize as they form an orthogonal basis. There's however not much difference in practice.

Chebyshev polynomials are more orthogonal (under the identity measure, and truly orthogonal with measure 1/sqrt(1-x²)) than monomials, but less than Legendre (truly orthogonal with the identity measure): polynomial_bases_spectrum

In the vertex domain (for a ring/circle graph), Chebyshev and Legendre polynomials are also more orthogonal than monomials: polynomial_bases_vertex

Jul 20 '20 16:07 mdeff

I have the same question as @zeal-github , and opened a question in stack exchange. After reading this issue, I think the question is clear now. Chebyshev polynomials may have better property with respect to parameter orthogonality, but the computation complexity is the same as directly using laplacian matrix.

Dec 11 '20 12:12 youkaichao

Any polynomial (monomials, Chebyshev, Legendre, etc.) of order K has the same representation power and can be used. We used Chebyshev because we had experience with it and thought it would be easier to optimize as they form an orthogonal basis. There's however not much difference in practice.

Chebyshev polynomials are more orthogonal (under the identity measure, and truly orthogonal with measure 1/sqrt(1-x²)) than monomials, but less than Legendre (truly orthogonal with the identity measure):

In the vertex domain (for a ring/circle graph), Chebyshev and Legendre polynomials are also more orthogonal than monomials:

Why did you choose the Chebyshev Polynomials of first kind instead of the Legendre Polynomials?

Mar 30 '21 00:03 hazdzz

Mostly for historical reasons. At first we were designing filters (e.g., to solve the diffusion of heat, the propagation of waves, and many others). As Chebyshev polynomials are excellent function approximators, we chose them to approximate those ideal filters we wanted to design. When learning filters, the choice of a polynomial basis doesn't matter for the expressivity of the filters, so we kept Chebyshev as we were familiar with them. We thought it could be easier to optimize, but that doesn't seem to make a difference in practice.

Mar 30 '21 12:03 mdeff

Mostly for historical reasons. At first we were designing filters (e.g., to solve the diffusion of heat, the propagation of waves, and many others). As Chebyshev polynomials are excellent function approximators, we chose them to approximate those ideal filters we wanted to design. When learning filters, the choice of a polynomial basis doesn't matter for the expressivity of the filters, so we kept Chebyshev as we were familiar with them. We thought it could be easier to optimize, but that doesn't seem to make a difference in practice.

In your opinions, it doesn't matter which orthogonal polynomials are chosen. Because in practice, the performance is not depends on the kinds of orthogonal polynomials. Am I understanding correctly?

Mar 30 '21 12:03 hazdzz

Yes. It seems that SGD works well enough to find the best polynomial, whatever the basis. That might also be because we use low-order polynomials when learning.

Mar 30 '21 12:03 mdeff

Yes. It seems that SGD works well enough to find the best polynomial, whatever the basis. That might also be because we use low-order polynomials when learning.

What is SGD?

Mar 30 '21 12:03 hazdzz

Stochastic Gradient Descent

Mar 30 '21 12:03 mdeff

Mostly for historical reasons. At first we were designing filters (e.g., to solve the diffusion of heat, the propagation of waves, and many others). As Chebyshev polynomials are excellent function approximators, we chose them to approximate those ideal filters we wanted to design. When learning filters, the choice of a polynomial basis doesn't matter for the expressivity of the filters, so we kept Chebyshev as we were familiar with them. We thought it could be easier to optimize, but that doesn't seem to make a difference in practice.

I have another big problem. Does the orthogonal polynomials causes low-pass filter? Or Spectral CNN (Spectral Networks and Deep Locally Connected Networks on Graphs) itself is a low-pass filter GNN?

Mar 31 '21 03:03 hazdzz

Neither spectral nor polynomial convolutions enforce low-pass filters. Polynomial convolutions enforce local filters (the locality is controlled by the order of the polynomials). Local filters can be high-pass (simply think about a multiplication by the Laplacian itself, a second-order derivative). The low-pass limitation is specific to GCN [Kipf et al.] as they merged the 0- and 1-neighborhood in the quest of learning a single parameter per filter.

Mar 31 '21 13:03 mdeff

cnn_graph cnn_graph copied to clipboard

why the Chebyshev polynomials is needed?

cnn_graph
cnn_graph copied to clipboard