pestpp icon indicating copy to clipboard operation
pestpp copied to clipboard

updating localizer gives error, prevents further execution?

Open wkitlasten opened this issue 2 years ago • 19 comments

I am running a set of priors with no par upgrades. I have some prior data conflict and ies_drop_conflicts is set to "True." Certain sets of pars are specific to single obs.

I am getting the following error. I suspect it comes about if an obs is dropped due to pdc and the entire column of the localization matrix becomes 0's? The error does not occur with ies_drop_conflicts=False. The original localization matrix loads fine during the initialization.

What is happening during the updating localizer step that might be causing this?

loc=pd.read_csv('we_200_loc.csv')
loc.shape
(2342, 3314)

But the report suggests it is expecting 6626 entries?

  ---  WARNING: 81 non-zero weighted observations are in conflict with the prior simulated ensemble.
  ---
...see rec file or prior_we_200.pdc.csvfor listing of conflicted observations

...dropping conflicted observations
...number of non-zero weighted observations reduced from 2344 to 2263

...updating localizer
Error condition prevents further execution:
Matrix.from_csv() error: wrong number of entries on line 0 , expecting 6626, found 3313

wkitlasten avatar Aug 20 '22 21:08 wkitlasten

This is an interesting interaction between components. Lemme see what tests exist that might be covering this and try to force this situation. The fact that there is an "updating localizers" makes me think ies is at least trying to cope with PDC and a localizer...

jtwhite79 avatar Aug 20 '22 21:08 jtwhite79

Looks like this is related to using a csv file format for the localizer. Im working on a patch now but to keep moving, you should be able to swap to an ascii .mat or binary .jcb format...

jtwhite79 avatar Aug 22 '22 19:08 jtwhite79

Cool... I will give that a try.

On Tue, Aug 23, 2022 at 7:33 AM J Dub @.***> wrote:

Looks like this is related to using a csv file format for the localizer. Im working on a patch now but to keep moving, you should be able to swap to an ascii .mat or binary .jcb format...

— Reply to this email directly, view it on GitHub https://github.com/usgs/pestpp/issues/201#issuecomment-1222821581, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSJXRF3FJLCXUOBZI2JLMDV2PIXRANCNFSM57D2C6UQ . You are receiving this because you authored the thread.Message ID: @.***>

-- "Perfect spheres are pointless."

wkitlasten avatar Aug 22 '22 21:08 wkitlasten

My setup works with no localizer, but obviously I need it!

It works with a .csv localizer, as long as it isn't updated due to dropping PDC.

But when I try the .jcb option I get:

...initializing localizer
reading 3982040 elements, 2342 rows, 4040 columns
invalid 'j':1833888 at 4294967295 data:0 i: 1599
invalid 'j':1833888 at 4294967294 data:0 i: 1598
invalid 'j':1833888 at 4294967293 data:0 i: 1597
...

To get my .jcb I am doing the following using pyemu, with df.index = weighted obs groups df.columns = adjustable parameter groups:

m = pyemu.Matrix.from_dataframe(df)
m.to_binary('loc.jcb')

What am I missing?

wkitlasten avatar Aug 25 '22 22:08 wkitlasten

Try m.to_coo() - you probably have so many possible elements in the localized that the original Jacobian format goes out of int range - coo is the "enhanced" binary format...I've got the csv issue fixed on feat_spq3 on my fork. I can make a release if you wanna stay with .csv

jtwhite79 avatar Aug 26 '22 01:08 jtwhite79

Doesn't seem to like my coo using m.to_coo('we_200_loc.coo'). Maybe I am missing something?

image

.log: 14:56:39,3.87e-05,loading localizer matrix from file we_200_loc.coo

.rec: ...maxsing: 10000000 ...eigthresh: 1e-06 ...initializing localizer

Not much else I can see...

wkitlasten avatar Aug 26 '22 03:08 wkitlasten

Can you give it the .jcb extension - ies still relies on the extension to decide what to do with the file (the coo part is determined in the first three integers in the binary file...)

jtwhite79 avatar Aug 26 '22 03:08 jtwhite79

Der... pest error window was pushed back behind others.

Localizer loaded fine with the .jcb extension. Unfortunately it still fails when trying to update the localizer. I'm just going to unweight those obs for now.

m.to_coo('we_200_loc.jcb')

image

wkitlasten avatar Aug 26 '22 04:08 wkitlasten

What are you thoughts on handling zero weight obs and fixed pars in the localizer within pest? Unless I'm mistaken, every time you set weights to zero or fix pars you have to rebuild the localizer, right? What about having a single nobs x npar (nobgnme x npargp) localizer and having pest drop rows and columns where appropriate?

When pest drops conflicting obs, isn't it just setting weights to zero? And if the removal of those obs results in a par or pargp column having all zeros, drop that column too?

(FWIW, I would have been burned by such behavior initially... but it seems convenient now!!)

wkitlasten avatar Aug 27 '22 22:08 wkitlasten

These are good questions. First off, the localizer can have more obs (groups) and par (groups) than what are currently nonzero weighted and adjustable. So you can build up a "full" localizer and then can turn off weights and fix/tie pars as needed.

And you are right about the conflicts - if a par (group) is only allowed to be conditioned from a single obs (group) and that obs (group) is "dropped" the localizer just removes that column entirely so that par (group) is implicitly fixed.

jtwhite79 avatar Aug 27 '22 23:08 jtwhite79

Makes perfect sense. I've been building a new localizer every time I make weights zero and/or fix pars... otherwise I seem to be getting the same error during the "initializing localizer" step. Maybe there is a glitch in the matrix?

wkitlasten avatar Aug 28 '22 02:08 wkitlasten

First off, the localizer can have more obs (groups) and par (groups) than what are currently nonzero weighted and adjustable. So you can build up a "full" localizer and then can turn off weights and fix/tie pars as needed.

So starting over from the beginning. I grabbed the latest exe I could find and rebuilt my localizer to include all my obgnme and pargp, regardless of weight or partrans (saved with pyemu.Matrix.to_coo()). pestpp-ies just stops with no warnings or errors. Nothing in the .log or .rec file beyond processing/initializing localizer. No "pest stopped working window." Just exits back to a dos prompt. What am I missing?

version: 5.1.20
binary compiled on Aug 12 2022 at 13:27:02

started at 08/31/22 18:19:20
...processing command line: ' pestpp-ies.exe prior_we_200.pst'
...using serial run manager

using control file: "prior_we_200.pst"
in directory: "D:\modelling\wairau\we_200_temp - Copy"
on host: "WAIW22736"

processing control file prior_we_200.pst
checking model IO files...done
              starting serial run manager ...


  ---  initializing  ---

  ---  using glm algorithm  ---
...using REDSVD for truncated svd solve
...maxsing: 10000000
...eigthresh:  1e-06
...initializing localizer
reading 1393204 elements, 3742 rows, 4184 columns

I confirmed:

len(obs.obgnme.unique())
3742
a.index.isin(obs.obgnme.unique()).all()
True
len(par.pargp.unique())
4184
a.columns.isin(par.pargp.unique()).all()
True
a.shape
(3742, 4184)

I'm not sure where to go from there. Suggestions?

(edit: same behavior with .csv version) (edit: edit: welcome to the Ministry of Silly Localizers) loca

wkitlasten avatar Aug 31 '22 06:08 wkitlasten

Alright, you caught me on this one. Internally the localizer will accept a forgive flag and that is used when updating it after PDC drops and also by pestpp-da. But that option was not exposed as a control file option. Its been added in 5.1.21 but its not in the documentation (at least yet) because I think it could get people in trouble. If you wanna use it - ies_localizer_forgive_extra(true).

I'll look at why the exceptions thrown by the localizer arent being echo'd to stdout or the rec file...

Nice image BTW!

jtwhite79 avatar Aug 31 '22 18:08 jtwhite79

Der... got too excited when I saw it was reading the localizer and went for coffee. Unfortunately it still stops unexpectedly:

(py37) D:\modelling\wairau\we_200_temp - Copy>pestpp-ies.exe prior_we_200.pst


             pestpp-ies: a GLM iterative ensemble smoother

                   by the PEST++ development team


version: 5.1.21
binary compiled on Aug 30 2022 at 14:32:08

started at 09/01/22 07:56:21
...processing command line: ' pestpp-ies.exe prior_we_200.pst'
...using serial run manager

using control file: "prior_we_200.pst"
in directory: "D:\modelling\wairau\we_200_temp - Copy"
on host: "WAIW22736"

processing control file prior_we_200.pst
checking model IO files...done
              starting serial run manager ...


  ---  initializing  ---

  ---  using glm algorithm  ---
...using REDSVD for truncated svd solve
...maxsing: 10000000
...eigthresh:  1e-06
...initializing localizer
reading 1393204 elements, 3742 rows, 4184 columns

(py37) D:\modelling\wairau\we_200_temp - Copy>

PCF (concise is nice!):

pcf version=2
* control data keyword
pestmode                                 estimation
noptmax                                 -1
nphinored                               5
svdmode                                 1
maxsing                          10000000
eigthresh                           1e-06
eigwrite                                1
ies_parameter_ensemble         prior_we_200.prior_draw.jcb
ies_drop_conflicts             true
overdue_giveup_fac             10
overdue_giveup_minutes         60
ies_save_binary                true
ies_ordered_binary             false
ies_localizer                  we_200_loc.coo.jcb
ies_num_reals   2
ies_localizer_forgive_extra    true
* parameter groups external
prior_we_200.pargp_data.csv
* parameter data external
prior_we_200.par_data.csv
* observation data external
prior_we_200.obs_data.csv
* model command line
python forward_run.py
* model input external
prior_we_200.tplfile_data.csv
* model output external
prior_we_200.insfile_data.csv

Nothing in rec or log beyond initializing/processing localizer.

Seems to work fine if I manually drop zero weight obs from the localizer.

wkitlasten avatar Aug 31 '22 20:08 wkitlasten

@wkitlasten can you zip that master dir and post it?

jtwhite79 avatar Aug 31 '22 20:08 jtwhite79

Hi @jtwhite79 and @wkitlasten. I might be able to add a bit more info here. I have run a few tests with an older pestpp-ies version (5.1.15). Obviously this is without the forgive flag, but on the Mac if I de-weight an observation (that is in the localiser), I get the following error:

terminating with uncaught exception of type std::runtime_error: Localizer::process_mat() error: the following rows in were not found in the non-zero-weight observation names or observation group names: ...

yet to test forgive flag with version 5.1.21, also need to test if this is also triggered after PDC (when IES is effectively doing the de-weighting)

briochh avatar Sep 01 '22 00:09 briochh

Ok, so obs dropped during PDC are tolerated when updating the localizer if localiser is just name-based (as opposed to group-based).

The issue arises if using a group-based localiser and an entire group is in conflict (and dropped):

libc++abi: terminating with uncaught exception of type std::runtime_error: Localizer::process_mat() error: listed observation group 'DIFFERENTGP' has no non-zero weight observations

again using 5.1.15.

UPDATE: seeing the same behaviour with 5.1.21 and using ++ies_localizer_forgive_extra(True)

briochh avatar Sep 01 '22 01:09 briochh

ugh. That is the edge case that is not being tested and, surprise, it is not supported. Testing a patch now...

jtwhite79 avatar Sep 01 '22 17:09 jtwhite79

@briochh provided a version that works when all obs in a group are dropped/zero wt, but I seem to be hitting a similar issue when all pars in a group are fixed. Has that been tested? Sorry for living on the edge.

wkitlasten avatar Sep 03 '22 21:09 wkitlasten

I just tested that on a very simple problem and it seemed to work as expected: the localizer lists a parameter group that has all fixed partrans - is this what you have too, Wes? Here is the line from the log file:

2022-09-03 16:37:47,0.000252,dropped 1 from localizer columns because forgive_missing is true

Here is the org localizer:

       2       2       2
  1.0000000E+00  0.0000000E+00
  0.0000000E+00  1.0000000E+00
* row names
group1
group2
* column names
group1
group2

And here is the initialized localizer:

     2     1     2
0
1
* row names
group1
group2
* column names
group2

(parameter group group1 has all fixed pars)

If you set verbose > 1, the (re)initialized localizer gets written to "initialized_localizer" so you can check it...

jtwhite79 avatar Sep 03 '22 21:09 jtwhite79

Der... forgot the flag. Testing pargps now.

wkitlasten avatar Sep 04 '22 00:09 wkitlasten

If I add ies_localizer_forgive_extra True to the ctrl file, it at least makes it past the localizer initialization:

reading 1393518 elements, 3743 rows, 4185 columns
dropped 1401 from localizer rows because forgive_missing is true

jtwhite79 avatar Sep 04 '22 00:09 jtwhite79

I am now able to use a "full" localizer when all obs in an single observation group have zero weights and/or when all pars in parameter group are fixed... as long as I remember to the ies_localizer_forgive_extra True flag :)

Thanks for addressing this.

wkitlasten avatar Sep 05 '22 23:09 wkitlasten