ParameterEstimocean.jl
ParameterEstimocean.jl copied to clipboard
Large forward maps make Julia crash
When trying to calibrate to LESbrary simulations, the forward map typically has over half a million elements, making it impossible to store a covariance matrix for EKI. How do you guys want to handle this? We could
- Look into using sparse matrices.
- Store fewer time steps in
observations
. This would make it harder to animate the solution downstream. - Add a
time_range
attribute toConcatenatedOutputMap
that specifies which time steps to track--defaults to all time steps. - Delete
OceanTurbulenceParameterEstimation.jl
and sip guava juice from Costco while brainstorming alternative research topics.
@glwagner @navidcy What do you guys think?
Oh! That's why Julia was unexpectedly quitting?!
I think I was running onto similar problem when I tried to calibrate the 3D model....
Let's do 4. I don't have Costco near me though. Problem!
How about 3?
Sparse matrices sounds good but also sounds bit like a mini-research endeavour?
I just realized that EnsembleKalmanProcesses.jl
can't take non-Matrix
types for the covariance matrix, so I'm also leaning towards 3.
I'd vote for using existing tools to solve this problem without changing source code, which is option 2, so:
i. Restrict comparison times using the times
keyword argument when constructing OneDimensionalTimeSeries
;
ii. When evaluating a parameterization / animating results, add more entries to times
and build a new InverseProblem
from the existing one (just changing out the new observations
); or
iii. Add output writers to ensemble_simulation
(this can be done just by appending an output writers to an existing InverseProblem.simulation
)
We'll want to do iii anyways, especially when the simulations get more expensive.
Another option is to add a new output map that computes some kind of loss (perhaps the easiest is just the norm of the difference between forward output and observations)?
I just realized that
EnsembleKalmanProcesses.jl
can't take non-Matrix
types for the covariance matrix, so I'm also leaning towards 3.
Eg we can't use sparse matrices? Damn the maps.
Let's do 4. I don't have Costco near me though. Problem!
We can fix that
We can also add a convenience utility that rebuilds an inverse problem with more times, eg something like
new_ip = with_observation_times(old_ip, new_times)
Just a note: we implemented ensemble transform Kalman inversion in EKP: https://github.com/CliMA/EnsembleKalmanProcesses.jl/pull/329
This variant of EKI not have the problem described here. In particular, the covariance matrix never has to be formed, and the time complexity will scale linearly with observation size.
That sounds useful @eviatarbach. It looks like the noise covariance is used here:
https://github.com/CliMA/ParameterEstimocean.jl/blob/a9dbf709eeeba5bfe223b8bb0e407434e332d76e/src/EnsembleKalmanInversions.jl#L500-L505
Can you show us how we would use ensemble transform Kalman inversion instead?
Unfortunately, looking at the code, it seems that when using any adaptive stepping schemes the use of the noise covariance during the EKI update is a bit tangled up with the time-step estimation. The code could use a little refactoring to create a separation of concerns there.
(The interface for "no adaptive timestepping" may be broken right now --- I'm not sure its tested)
Ah yes, you still need to provide the noise covariance (in fact, its inverse); ETKI will only have better time complexity than EKI in the case that this has a simple structure, ideally diagonal. If this is satisfied, then I would try, using the latest dev version of EKP (haven't created a new release yet), to just create an EKI struct with argument ensemble_kalman_process = TransformInversion(inv_Γ)
and see if it works automatically.
If this runs into problems I would be happy to help troubleshoot.
Ah ok, so this might work now with user input here:
https://github.com/CliMA/ParameterEstimocean.jl/blob/a9dbf709eeeba5bfe223b8bb0e407434e332d76e/src/EnsembleKalmanInversions.jl#L168
PS the distinction between the "process" and EnsembleKalmanProcess
is a little confusing (same name for different things?) how should we think about this?
Yes, exactly.
I agree, the naming is confusing: in one case, it refers to the struct which indicates what type of inversion to use, e.g. Inversion()
or Unscented()
. In the other case, it refers to the struct holding the entire history of the ensembles, etc. I think it might be good to rename the former to inversion_type
or something like that.
Mm yeah! We avoid names with "type" in them for julia code (too easy to confused with julia types) but something like that might work. Unscented
is a kind of Inversion
right? But Inversion
and Unscented
are different?
Good point on the naming. Yes, Unscented
is a variant of EKI, whereas Inversion
is regular EKI. There is also TransformInversion
, Sampler
, etc.