NeuralPDE.jl
NeuralPDE.jl copied to clipboard
PINNs Lorenz param estim : kink in the output path of the predicted variables
Running the Lorenz param estim example, the estimated parameters are:
3-element Vector{Float64}: 10.000484105285961 27.99897160477475 2.667291549846074
Then in the graphical analysis:
initθ = discretization.init_params acum = [0;accumulate(+, length.(initθ))] sep = [acum[i]+1 : acum[i+1] for i in 1:length(acum)-1] minimizers = [res.minimizer[s] for s in sep] ts = [domain.domain.lower:dt/10:domain.domain.upper for domain in domains][1] u_predict = [[discretization.phii[1] for t in ts] for i in 1:3] plot(sol) plot!(ts, u_predict, label = ["x(t)" "y(t)" "z(t)"])
produces a kink in the x, predicted x variables different from the published graph as follows:
What optimization sequence did you use? ADAM -> BFGS?
@finmod try run the code from the test https://github.com/SciML/NeuralPDE.jl/blob/2e9bac04111bb6a0049da1f26939b09984a53bcb/test/NNPDE_tests.jl#L480
The difference is that we train on more points in the tests.
ts = [domain.domain.lower:dt/5:domain.domain.upper for domain in domains][1]
function getData(sol)
data = []
us = hcat(sol(ts).u...)
ts_ = hcat(sol(ts).t...)
return [us,ts_]
end
@KirillZubov Correct. There is this one line difference:
ts = [domain.domain.lower:dt/5:domain.domain.upper for domain in domains][1]
Let me run the test version to confirm
I wonder what it looks like so the learning behavior is different (example in the docs) depending on the operating environment or something else
We just shouldn't use GridTraining
also, try updating all packages to the latest versions and running the version from the documentation. To make sure it doesn't depend on package versions
With 101 observations, there are no kinks in the trajectories. The plots should be presented as scatter to immediately perceive the number of observations that is being used in the parameter learning.
@ChrisRackauckas This gives me an opportunity to make a comment on NN and parameter estimation that I have been pondering for quite a long time. You have established that the strength of an NN approach to parameter estimation is on model discovery from a small sample of observations. With LV and now Lorenz, you have repeatedly shown that for as few as 21 initial observations on each time series, the generating model can be successfully recovered. This is the major advantage of the NN approach and it should be broadcasted loud for those applications that cannot replicate a data generating experiment as in economics and social sciences.
The NN approach to parameter estimation is outstanding when the minimum requirement for regression (i.e. matrix inversion) is not met. But as soon as the number of observations is greater than 40 and here it is 101 then any minimum distance or minimum variance estimator will be much more efficient as you have shown in DiffEqBenchmarks/Tutorials.
For this issue, the kinks remain unexplained for the small sample case of 21 observations. The hint about GridTraining should be investigated further.
@zoemcc mentioned that this might not be able to be solved with GridTraining: it may require that the sampling adapts since otherwise it'll never "see" between the grid points. So that's why it might need to be swapped.
I have run the example with dt varying from 0.01 to 0.05. As it is now dt= 0.01 produces 101 observations per trajectory in the interval {0,1]. The breakdown of GridTraining occurs between dt=0.04 and dt=0.05. At dt=0.04, the number of observations is 26.
As mentioned above, parameter estimation by NN when the dataseries have 100 observations or more is a no brainer because there are more efficient methods in DiffParamEstim to solve this. Instead, using NN for parameter estimation in small samples and in particular with few initial observations is an exiting property of NN in discovering a model when other methods fail. There was the LV example with 21 initial observations per dataseries and now there is this second example on a chaotic ODE system.
Finally, @ChrisRackauckas is right to suggest preferring other discretization methods to GridTraining, particularly if they hold at dt=0.05. Also the graphical analysis should produce a scatter of the data points used for estimation and not a fine grid simulation of the model using the estimated parameters.
Generally I've found the best results with QuadratureTraining with ~1000-2000+ maxiters and StochasticTraining with 128-256+ samples. GridTraining should basically never be used. The number of observations for parameter estimation here is orthogonal to the training strategy since each training strategy maintains its own set of samples in the domain to train the PDE loss, whereas observational data enters in through the additional_loss
functionality as a separate concern.
I agree that the analysis should provide more tools to show the samples used for training the PDE loss and it's something we'll be working towards.
We just shouldn't use GridTraining
It might be worth pointing that out in the docs. We could add a little caveat wherever GridTraining is being used (e.g. https://neuralpde.sciml.ai/stable/pinn/wave/)
The NeuralPDE paper has a great explanation of why one ought to avoid GridTraining most of the time.
I think it would just be best to avoid putting it into tutorials. The tutorials should match how we expect users to be using the code, and given we don't recommend it, that's an uncomfortable disconnect.
I suggest to set the best current defaults in tutorials to reflect the findings of section 5 of the citation paper: QuasiRandomSampling and ADAM+BFGS.
Note that the quadrature methods were found to still be more robust, but indeed quasi-random sampling once tuned is quite fast. So either one works.
@ChrisRackauckas Speaking of https://github.com/ChrisRackauckas/PINN_Quadrature, is it a case of "The check is in the post!"?
what do you mean?
You indicate in the PINN for all problems paper that PINN_Quadrature tests are in your github repo but it is no yet uploaded.
Oh, forgot to make it public 😅
Great! Thanks