ParameterEstimocean.jl icon indicating copy to clipboard operation
ParameterEstimocean.jl copied to clipboard

Large forward maps make Julia crash

Open adelinehillier opened this issue 3 years ago • 16 comments

When trying to calibrate to LESbrary simulations, the forward map typically has over half a million elements, making it impossible to store a covariance matrix for EKI. How do you guys want to handle this? We could

  1. Look into using sparse matrices.
  2. Store fewer time steps in observations. This would make it harder to animate the solution downstream.
  3. Add a time_range attribute to ConcatenatedOutputMap that specifies which time steps to track--defaults to all time steps.
  4. Delete OceanTurbulenceParameterEstimation.jl and sip guava juice from Costco while brainstorming alternative research topics.

@glwagner @navidcy What do you guys think?

adelinehillier avatar Nov 19 '21 01:11 adelinehillier

Oh! That's why Julia was unexpectedly quitting?!

I think I was running onto similar problem when I tried to calibrate the 3D model....

navidcy avatar Nov 19 '21 01:11 navidcy

Let's do 4. I don't have Costco near me though. Problem!

navidcy avatar Nov 19 '21 01:11 navidcy

How about 3?

Sparse matrices sounds good but also sounds bit like a mini-research endeavour?

navidcy avatar Nov 19 '21 01:11 navidcy

I just realized that EnsembleKalmanProcesses.jl can't take non-Matrix types for the covariance matrix, so I'm also leaning towards 3.

adelinehillier avatar Nov 19 '21 01:11 adelinehillier

I'd vote for using existing tools to solve this problem without changing source code, which is option 2, so:

i. Restrict comparison times using the times keyword argument when constructing OneDimensionalTimeSeries; ii. When evaluating a parameterization / animating results, add more entries to times and build a new InverseProblem from the existing one (just changing out the new observations); or iii. Add output writers to ensemble_simulation (this can be done just by appending an output writers to an existing InverseProblem.simulation)

We'll want to do iii anyways, especially when the simulations get more expensive.

glwagner avatar Nov 19 '21 02:11 glwagner

Another option is to add a new output map that computes some kind of loss (perhaps the easiest is just the norm of the difference between forward output and observations)?

glwagner avatar Nov 19 '21 02:11 glwagner

I just realized that EnsembleKalmanProcesses.jl can't take non-Matrix types for the covariance matrix, so I'm also leaning towards 3.

Eg we can't use sparse matrices? Damn the maps.

glwagner avatar Nov 19 '21 02:11 glwagner

Let's do 4. I don't have Costco near me though. Problem!

We can fix that

glwagner avatar Nov 19 '21 02:11 glwagner

We can also add a convenience utility that rebuilds an inverse problem with more times, eg something like

new_ip = with_observation_times(old_ip, new_times)

glwagner avatar Nov 19 '21 02:11 glwagner

Just a note: we implemented ensemble transform Kalman inversion in EKP: https://github.com/CliMA/EnsembleKalmanProcesses.jl/pull/329

This variant of EKI not have the problem described here. In particular, the covariance matrix never has to be formed, and the time complexity will scale linearly with observation size.

eviatarbach avatar Oct 03 '23 12:10 eviatarbach

That sounds useful @eviatarbach. It looks like the noise covariance is used here:

https://github.com/CliMA/ParameterEstimocean.jl/blob/a9dbf709eeeba5bfe223b8bb0e407434e332d76e/src/EnsembleKalmanInversions.jl#L500-L505

Can you show us how we would use ensemble transform Kalman inversion instead?

Unfortunately, looking at the code, it seems that when using any adaptive stepping schemes the use of the noise covariance during the EKI update is a bit tangled up with the time-step estimation. The code could use a little refactoring to create a separation of concerns there.

(The interface for "no adaptive timestepping" may be broken right now --- I'm not sure its tested)

glwagner avatar Oct 03 '23 13:10 glwagner

Ah yes, you still need to provide the noise covariance (in fact, its inverse); ETKI will only have better time complexity than EKI in the case that this has a simple structure, ideally diagonal. If this is satisfied, then I would try, using the latest dev version of EKP (haven't created a new release yet), to just create an EKI struct with argument ensemble_kalman_process = TransformInversion(inv_Γ) and see if it works automatically.

If this runs into problems I would be happy to help troubleshoot.

eviatarbach avatar Oct 03 '23 14:10 eviatarbach

Ah ok, so this might work now with user input here:

https://github.com/CliMA/ParameterEstimocean.jl/blob/a9dbf709eeeba5bfe223b8bb0e407434e332d76e/src/EnsembleKalmanInversions.jl#L168

PS the distinction between the "process" and EnsembleKalmanProcess is a little confusing (same name for different things?) how should we think about this?

glwagner avatar Oct 03 '23 14:10 glwagner

Yes, exactly.

I agree, the naming is confusing: in one case, it refers to the struct which indicates what type of inversion to use, e.g. Inversion() or Unscented(). In the other case, it refers to the struct holding the entire history of the ensembles, etc. I think it might be good to rename the former to inversion_type or something like that.

eviatarbach avatar Oct 03 '23 14:10 eviatarbach

Mm yeah! We avoid names with "type" in them for julia code (too easy to confused with julia types) but something like that might work. Unscented is a kind of Inversion right? But Inversion and Unscented are different?

glwagner avatar Oct 03 '23 14:10 glwagner

Good point on the naming. Yes, Unscented is a variant of EKI, whereas Inversion is regular EKI. There is also TransformInversion, Sampler, etc.

eviatarbach avatar Oct 03 '23 15:10 eviatarbach