NeuralPDE.jl icon indicating copy to clipboard operation
NeuralPDE.jl copied to clipboard

Weak adversarial network PDE solvers

Open ChrisRackauckas opened this issue 4 years ago • 12 comments

https://arxiv.org/abs/1907.08272

ChrisRackauckas avatar Aug 29 '19 22:08 ChrisRackauckas

I would like to work on this issue. I have read and understood the paper and related concepts.

I have done a crude implementation of paper for solving elliptic PDEs with Dirichlet boundary condition with the networks hyperparameters as described in the paper (4.1.2) for dimensions = 5. It can be found here. I trained it for 500 iterations (of the 20,000 iters stated) and the network seems to be training somewhat well.

Could you guide me ahead with how should I proceed with this?

  • Should I train this currently implemented example till converged?
  • Should I try out one more example on a parabolic equation involving time as well?
  • Or should I directly proceed with writing the API for this PDE solver?

Ayushk4 avatar Oct 06 '19 11:10 Ayushk4

Should I train this currently implemented example till converged?

Yes, let's make sure it converges "all of the way" first. It still looks like it's missing some of the corner behavior.

Should I train this currently implemented example till converged?

Try to get it working where, instead of writing down the discretization, it should just call DifferentialEquations.jl. This can use DiffEqFlux.jl's diffeq_adjoint to make sure the adjoint for training is the fast one.

Or should I directly proceed with writing the API for this PDE solver?

Let's wait on that until we know it's all working, then slap an API on it and "package it up"

ChrisRackauckas avatar Oct 06 '19 13:10 ChrisRackauckas

Yes, let's make sure it converges "all of the way" first. It still looks like it's missing some of the corner behavior.

I trained it till converged and seems to fit well for dims = 20. I have uploaded the plots and notebooks for the same. For dims = 5, however, after about 6400 iterations, the loss went NaN due to Adversarial Network going Inf. Still, it seemed to fit pretty well. I believe some hyperparameter tuning should do the trick for this case.

I moved the models to GPU and trained these there since it was slow on the CPU.

A couple of problems were faced while training on GPUs, being specific to GPUs. Very slow backpropagation on sinc , tanh not working as activation and problems with implementation on AdaGrad optimiser. I got it running after some workarounds. I believe most of these are Tracker and perhaps version related. I would look into these after I am done with this PR on the latest versions of Julia, Flux / Zygote and raise issue it if still persists.

Try to get it working where, instead of writing down the discretization, it should just call DifferentialEquations.jl. This can use DiffEqFlux.jl's diffeq_adjoint to make sure the adjoint for training is the fast one.

I am proceeding on to getting this to work where it calls DifferentialEquations.jl.

Please let me know if you have something else in mind?

Ayushk4 avatar Oct 13 '19 04:10 Ayushk4

A couple of problems were faced while training on GPUs, being specific to GPUs. Very slow backpropagation on sinc , tanh not working as activation and problems with implementation on AdaGrad optimiser. I got it running after some workarounds. I believe most of these are Tracker and perhaps version related. I would look into these after I am done with this PR on the latest versions of Julia, Flux / Zygote and raise issue it if still persists.

Some of these may just be slow GPU kernels. It could be worth isolating them to get an MWE for the GPU developers.

ChrisRackauckas avatar Oct 13 '19 11:10 ChrisRackauckas

Please let me know if you have something else in mind?

Sounds great!

ChrisRackauckas avatar Oct 13 '19 11:10 ChrisRackauckas

instead of writing down the discretisation, it should just call DifferentialEquations.jl.

I have a doubt here. For PDEs involving time, there are two methods mentioned in the paper. First one outputs N (no of time segments) different set of parameters values for the primal network, corresponding to each timestep. The other one changes the weak solution to give one set of parameter values. For which of these two methods, should I try this out? Could you also guide on how to proceed with this by pointing me to some resources?

Ayushk4 avatar Oct 27 '19 13:10 Ayushk4

ping @ChrisRackauckas . ^

Ayushk4 avatar Oct 29 '19 14:10 Ayushk4

Hmm, I was thinking of Algorithm 2. That can be any time discretization, so time step that with DiffEq. But now I see you can't because you don't end up with a differential equation for the evolution. Instead you need to optimize at each step, so what I was thinking wasn't possible, so ignore that :). Instead stick to the paper here.

ChrisRackauckas avatar Nov 02 '19 08:11 ChrisRackauckas

We can generalize it to use arbitrary tableaus though, but this can be later. Method 3 might be better anyways

ChrisRackauckas avatar Nov 02 '19 08:11 ChrisRackauckas

Sure. I am proceeding with Algorithm 3 then.

Ayushk4 avatar Nov 02 '19 09:11 Ayushk4

I have implemented the Algorithm 3 as in the paper (link). I have uploaded plots (also separate images) for the same. It seems to have trained well, but took somewhat long to train ~10hrs on GPU.

I was thinking of equation (12) from method-3 that has to be minimized. Instead of numerical integration(for the outer integral), maybe we can use DifferentialEquations.jl . Maybe it could be written down as solving dy/dt = f(y,t) for T=1 with initial value at t=0 given? Is it worth a shot?

Equation

Since the method-3 example works fine, I can first proceed with writing the API for it and then trying the above and make changes to API accordingly (or keeping both methods). I could also go the other way round.

What do you suggest on this?

Ayushk4 avatar Nov 10 '19 05:11 Ayushk4

Yes, this sounds like a good direction to go down.

ChrisRackauckas avatar Nov 11 '19 04:11 ChrisRackauckas