NeuralPDE.jl icon indicating copy to clipboard operation
NeuralPDE.jl copied to clipboard

PINNs Lorenz param estim : kink in the output path of the predicted variables

Open finmod opened this issue 3 years ago • 20 comments

Running the Lorenz param estim example, the estimated parameters are:

3-element Vector{Float64}: 10.000484105285961 27.99897160477475 2.667291549846074

Then in the graphical analysis:

initθ = discretization.init_params acum = [0;accumulate(+, length.(initθ))] sep = [acum[i]+1 : acum[i+1] for i in 1:length(acum)-1] minimizers = [res.minimizer[s] for s in sep] ts = [domain.domain.lower:dt/10:domain.domain.upper for domain in domains][1] u_predict = [[discretization.phii[1] for t in ts] for i in 1:3] plot(sol) plot!(ts, u_predict, label = ["x(t)" "y(t)" "z(t)"])

produces a kink in the x, predicted x variables different from the published graph as follows:

image

finmod avatar Apr 05 '21 09:04 finmod

What optimization sequence did you use? ADAM -> BFGS?

ChrisRackauckas avatar Apr 05 '21 10:04 ChrisRackauckas

@finmod try run the code from the test https://github.com/SciML/NeuralPDE.jl/blob/2e9bac04111bb6a0049da1f26939b09984a53bcb/test/NNPDE_tests.jl#L480

KirillZubov avatar Apr 07 '21 12:04 KirillZubov

The difference is that we train on more points in the tests.

ts = [domain.domain.lower:dt/5:domain.domain.upper for domain in domains][1]
function getData(sol)
    data = []
    us = hcat(sol(ts).u...)
    ts_ = hcat(sol(ts).t...)
    return [us,ts_]
end

KirillZubov avatar Apr 07 '21 12:04 KirillZubov

@KirillZubov Correct. There is this one line difference:

ts = [domain.domain.lower:dt/5:domain.domain.upper for domain in domains][1]

Let me run the test version to confirm

finmod avatar Apr 07 '21 12:04 finmod

I wonder what it looks like so the learning behavior is different (example in the docs) depending on the operating environment or something else

KirillZubov avatar Apr 07 '21 12:04 KirillZubov

We just shouldn't use GridTraining

ChrisRackauckas avatar Apr 07 '21 12:04 ChrisRackauckas

also, try updating all packages to the latest versions and running the version from the documentation. To make sure it doesn't depend on package versions

KirillZubov avatar Apr 07 '21 12:04 KirillZubov

With 101 observations, there are no kinks in the trajectories. The plots should be presented as scatter to immediately perceive the number of observations that is being used in the parameter learning.

@ChrisRackauckas This gives me an opportunity to make a comment on NN and parameter estimation that I have been pondering for quite a long time. You have established that the strength of an NN approach to parameter estimation is on model discovery from a small sample of observations. With LV and now Lorenz, you have repeatedly shown that for as few as 21 initial observations on each time series, the generating model can be successfully recovered. This is the major advantage of the NN approach and it should be broadcasted loud for those applications that cannot replicate a data generating experiment as in economics and social sciences.

The NN approach to parameter estimation is outstanding when the minimum requirement for regression (i.e. matrix inversion) is not met. But as soon as the number of observations is greater than 40 and here it is 101 then any minimum distance or minimum variance estimator will be much more efficient as you have shown in DiffEqBenchmarks/Tutorials.

For this issue, the kinks remain unexplained for the small sample case of 21 observations. The hint about GridTraining should be investigated further.

finmod avatar Apr 07 '21 13:04 finmod

@zoemcc mentioned that this might not be able to be solved with GridTraining: it may require that the sampling adapts since otherwise it'll never "see" between the grid points. So that's why it might need to be swapped.

ChrisRackauckas avatar Apr 08 '21 13:04 ChrisRackauckas

I have run the example with dt varying from 0.01 to 0.05. As it is now dt= 0.01 produces 101 observations per trajectory in the interval {0,1]. The breakdown of GridTraining occurs between dt=0.04 and dt=0.05. At dt=0.04, the number of observations is 26.

As mentioned above, parameter estimation by NN when the dataseries have 100 observations or more is a no brainer because there are more efficient methods in DiffParamEstim to solve this. Instead, using NN for parameter estimation in small samples and in particular with few initial observations is an exiting property of NN in discovering a model when other methods fail. There was the LV example with 21 initial observations per dataseries and now there is this second example on a chaotic ODE system.

Finally, @ChrisRackauckas is right to suggest preferring other discretization methods to GridTraining, particularly if they hold at dt=0.05. Also the graphical analysis should produce a scatter of the data points used for estimation and not a fine grid simulation of the model using the estimated parameters.

finmod avatar Apr 19 '21 13:04 finmod

Generally I've found the best results with QuadratureTraining with ~1000-2000+ maxiters and StochasticTraining with 128-256+ samples. GridTraining should basically never be used. The number of observations for parameter estimation here is orthogonal to the training strategy since each training strategy maintains its own set of samples in the domain to train the PDE loss, whereas observational data enters in through the additional_loss functionality as a separate concern.

I agree that the analysis should provide more tools to show the samples used for training the PDE loss and it's something we'll be working towards.

zoemcc avatar Apr 20 '21 21:04 zoemcc

We just shouldn't use GridTraining

It might be worth pointing that out in the docs. We could add a little caveat wherever GridTraining is being used (e.g. https://neuralpde.sciml.ai/stable/pinn/wave/)

The NeuralPDE paper has a great explanation of why one ought to avoid GridTraining most of the time.

killah-t-cell avatar Aug 03 '21 05:08 killah-t-cell

I think it would just be best to avoid putting it into tutorials. The tutorials should match how we expect users to be using the code, and given we don't recommend it, that's an uncomfortable disconnect.

ChrisRackauckas avatar Aug 03 '21 09:08 ChrisRackauckas

I suggest to set the best current defaults in tutorials to reflect the findings of section 5 of the citation paper: QuasiRandomSampling and ADAM+BFGS.

finmod avatar Aug 03 '21 09:08 finmod

Note that the quadrature methods were found to still be more robust, but indeed quasi-random sampling once tuned is quite fast. So either one works.

ChrisRackauckas avatar Aug 03 '21 10:08 ChrisRackauckas

@ChrisRackauckas Speaking of https://github.com/ChrisRackauckas/PINN_Quadrature, is it a case of "The check is in the post!"?

finmod avatar Aug 03 '21 12:08 finmod

what do you mean?

ChrisRackauckas avatar Aug 03 '21 12:08 ChrisRackauckas

You indicate in the PINN for all problems paper that PINN_Quadrature tests are in your github repo but it is no yet uploaded.

finmod avatar Aug 03 '21 12:08 finmod

Oh, forgot to make it public 😅

ChrisRackauckas avatar Aug 03 '21 12:08 ChrisRackauckas

Great! Thanks

finmod avatar Aug 03 '21 12:08 finmod