NeuralOperators.jl
NeuralOperators.jl copied to clipboard
Fix GNO example
Codecov Report
Merging #79 (c51ef55) into main (54602e6) will decrease coverage by
5.58%. The diff coverage is81.81%.
@@ Coverage Diff @@
## main #79 +/- ##
==========================================
- Coverage 95.70% 90.11% -5.59%
==========================================
Files 10 10
Lines 163 172 +9
==========================================
- Hits 156 155 -1
- Misses 7 17 +10
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/graph_kernel.jl | 67.74% <81.81%> (-32.26%) |
:arrow_down: |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more
Still need some revise, discussed F2F
What is the loss after training?
@MilkshakeForReal Please take a look at this, it is the relative L2 loss
Update:
Sorry I misunderstand the question. If you are asking about the value of the loss after training, I'll get back to you later owing to the example is not implemented correctly yet
Yes, I‘m asking about the value. The reason I'm asking is that I don't see any magical power in GraphKernel here. It is just an NNConv and could really be almost any GNNConvs. And you see no justification for using it in the paper. The power of GNO is likely due to the encoder and sampling (or maybe something else?). Please let me know if the loss value is available, even if it's a large one.
I am not sure what kind of magic you expect to see in the GNO. Just like the eqs you mentioned in #74 is nothing but a message passing NN?
The main idea of neural operator is to learn the mapping in the spectral space. In GNO, the implementation is done by graph signal laplace transform and message passing neural network. @yuehhua is an expert of GNN, and maybe would like to explane more detail on this.
I think we all are doing science. There should be no magic. All advantages GNO have over FNO is sampling. GNO doesn't require strictly grid sampling from input functions, but FNO do. GNO should come out with at most the same performance as FNO do.
When you expect to get GNO by just implementing the convolutional layer, you are expecting magic. As I already said, you must also implement the encoder and Nyström approximation.
Nyström approximation is already there and encoder is the GraphKernel. As for the graph convolutional layer, it is just a generalization of regular convolutional layer. The only magic thing should go to the non-linearity.
Please read equation 7 carefully and make sure you understand what each item means
So, you mean the projection $v_0(x) = P(x, a(x), a_{\epsilon}(x), \nabla a_{\epsilon}(x)) + p$?
@MilkshakeForReal About the encoder, you mean the GaussianNormalizer?
Of course Nyström approximation is the more important thing. It is not clear where you have implemented it. I'll check back but now I really have to go to bed.
@MilkshakeForReal Take your time, and please feel free to open a PR if you still think there is anything wrong 😄
The $a_{\epsilon}(x)$ is the encoded $a(x)$ and the encoder is just a linear transform...
Ok I'm back. In the spirit of scientific research, I don't want to discourage you from trying different things out. I do want to see the loss(es) first. But for now allow me to have a comment on Equation 7.
No it's not the GaussianNormalizer. It normalizes the data, not smoothes it. The smoothed functions are already generated in the data, see here. Unfortunately, there is no code to see how it is generated (or I missed that). The linear map P is less important I was not talking about that. Please note the motivation behind it
Due to the smoothing effect of the inverse elliptic operator in (3) with respect to the input data a (and indeed f when we consider this as input)
So the authors know how the solution operator acts on a(x), and encode that info into the input. The model they actually use is restricted on the elliptic PDE (4). And in the original paper they only test their model on this particular PDE :sweat_smile:. You can try removing $a_{\epsilon}(x)$ from the input and see how it affects the performance. It will show how general GNO actually is.
GNO doesn't require strictly grid sampling from input functions, but FNO do.
No FNO does not require that, only FFT does. General DFT can be performed on a nonuniform grid.
No FNO does not require that, only FFT does. General DFT can be performed on a nonuniform grid.
Oh, yeah that's true. But could you give me nonuniform general DFT in practice?
@yuehhua Is this ready to merge?
@MilkshakeForReal For performing on a irregular domain, you may want to check Geo-FNO.
@foldfelis CPU works, but not GPU. So, currently, not yet.
Thanks for the info. Is Nyström approximation implemented in GraphSignals.generate_grid?
@MilkshakeForReal To my understand, there is no implementation for Nyström approximation, Nyström approximation is just a way to approximate kernel in RHKS. Thus, the model can really be computable, otherwise it is just an abstract mathematical concept. Please check paper in section 3 in p.8.
I wouldn't need to, unless there is a new version just published. If you haven't implemented that's what I need to know.
In other words, GraphKernel, which is the GNN approximator itself, is the result of Nyström approximation of kernel for PDE.
@yuehhua Great work, Tks
@yuehhua Great work, Tks