pyemu
pyemu copied to clipboard
AttributeError: 'Pandas' object has no attribute 'chkpar'
I am trying to do a forward noptmax=0 run of a model. and I got the error message: AttributeError: 'Pandas' object has no attribute 'chkpar' which occurs immediately upon execution of the forward.py script. Here is the screenshot of errors:
starting list mlt 2024-03-19 11:59:12.021732 number of chunks to process: 9 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2048, in _process_chunk_list_files raise e File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2045, in _process_chunk_list_files _process_list_file(model_file, df) File "C:\Users\jtraylor\AppData\Local\miniforge3\envs\gis\Lib\site-packages\pyemu\utils\helpers.py", line 2177, in _process_list_file assert len(common_idx) == mlt.chkpar, ( ^^^^^^^^^^ AttributeError: 'Pandas' object has no attribute 'chkpar' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\projects\D_GV6800\analysis\model\cache_cal_testing\cache500eastcr_forward\forward_run.py", line 12, in
My pyemu version is 1.3.3+6.g3322178a. My pandas version is 2.1.1. This model was made on an older version of pyemu and pandas. Is this a versioning issue? I'm wondering if its a new versioning issue between pyemu and pandas?
Did you rerun the PstFrom pest-interface construction with the new version of pyemu?
I have not, but Moussa had a similar issue and he rebuilt everything from scratch w/ the latest versions of packages and is now having an issue running his model.
I'm going to try to use pyemu version 1.2.0 and older pandas
tried running with pyemu 1.2.0 and pandas 2.0.2 and still same error. this is all with python 3.11. It did run for a colleague that is still using python 3.9, so wondering if its a compatibility issue with newer python?
So just to be clear, you reran the python functions that build the pest interface with PstFrom (essentially rebuilt the control file, template file, instruction files, etc)? And this was done using the same version of pyemu/pandas that you are using at runtime that calls apply_genericlist_pars()? We have several CI tests that cover this use case with PstFrom and they seem to be ok and I know that chkpar
bit was added recently as a runtime check to make sure things are happening as expected. So if you didnt rebuild the interface with the updated pyemu/pandas, I can see where the issues you are having would come from...
yes, i rebuilt the entire pest framework (PstFrom) from scratch, including observations. All of the rebuilding worked fine. The problem occurs when we try to do a forward run with pestpp using the forward_run.py
I, and a couple other colleagues get the same error as follows:
Traceback (most recent call last):
File "D:\projects\D_GV6800\analysis\model\cache_cal_testing\cache-pest\cache500eastcr\forward_run.py", line 12, in
Ok thats a new error from before and I think its related to the version of pandas (maybe). What version of pandas are you using?
im using: pandas 2.1.1 py311hf63dbb6_0 conda-forge
hmm. That should be ok. Can you zip up your model+pest files and post them somewhere so that I can check them out?
can i email them to you?
Ok I think I see the issue (@briochh would know better) - it looks like you are broadcasting grid-type multiplier parameters for the wel package across multiple stress periods, which isn't a problem in and of itself except that you have differing numbers of well package entries across the stress periods. I think this is what is tripping things up because the number and location of individual well entry (ie grid-type) jumps from 420 in the first stress period to 1042 in the second stress period, which looks like is confusing pandas about how to apply the 420 grid-based wel multipliers. To fix this, either don't broadcast the grid-type pars across multiple stress periods or fill the well entries across the stress periods with zero-flux (dummy) entries so that all stress periods have the same well entries.
@briochh does this jive? For some reason I thought we were trapping for this use case when add_pars() is called...
@jtwhite79, yeah that sounds like something that could cause some dramas. We do have a check in the _par_prep()
part of add_parameters()
but I think this is only checking that the number of cols is consistent across files that are passed together. Sounds like this issue arises because there is a mismatch between the number of rows right?
If this is what is causing the issue, I feel like we could handle this are forward run time, as the multipliers are aligned according to the values of index_cols so the number of entries or even the order shouldn't matter. I think we should put together a quick test that reproduces this issue and see if this casting is feasible. One challenge might come if the values in index_col
are effectively meaningless (i.e. just a counter for the order that they appear in the input file) which might result if we introduce the flexibility raised in the #490. We primarily need to avoid casting pars to the wrong place, erroring is way better than silently doing the wrong thing! Either way, trapping and raising at setup or supporting at run time, or erroring at frun, we should improve this opaque error above!
@jptraylor, in the mean time your best/easiest work around would be, as @jtwhite79 suggests, to pad those input files so that the same well locs are represented in all the files that you are passing to that add_paramters()
call.
@jptraylor, are you still seeing that """ assert len(common_idx) == mlt.chkpar, ( ^^^^^^^^^^ AttributeError: 'Pandas' object has no attribute 'chkpar' """ error in the the traceback?
@jtwhite79 @briochh I'm not understanding the issue with applying the grid-scale pumping multipliers for each stress period's wel package. Each mult file located in the mult directory, with suffix grid.csv is unique for each stress period and has the same number of rows or mults as there are number of wells in the associated wel file, so those match up. I guess i dont understand where the mismatch is occurring.
@jptraylor, are you still seeing that """ assert len(common_idx) == mlt.chkpar, ( ^^^^^^^^^^ AttributeError: 'Pandas' object has no attribute 'chkpar' """ error in the the traceback?
Yes, i got that error when i initially ran the model (which was originally built on older versions of pyemu and pandas) with the recent pyemu and pandas versions. Then, per Jermey's suggestion, i rebuilt the pest framework with the recent pyemu/pandas versions and got that xx.get() error
Ok, that xx.get() error is really just an indication that something is failing on the multiprocess -- the tracebacks from there can be a little long winded. The true error looks to be related to ValueError: cannot reindex on an axis with duplicate labels
which likely comes from pandas -- there may be more information further up the traceback.
Usually these pandas error relate to duplicate indexes (or trying to create and index that will have duplicates).
Are you ok with @jtwhite79 sharing your .zip with me?
@jptraylor, are you still seeing that """ assert len(common_idx) == mlt.chkpar, ( ^^^^^^^^^^ AttributeError: 'Pandas' object has no attribute 'chkpar' """ error in the the traceback?
Yes, i got that error when i initially ran the model (which was originally built on older versions of pyemu and pandas) with the recent pyemu and pandas versions. Then, per Jermey's suggestion, i rebuilt the pest framework with the recent pyemu/pandas versions and got that xx.get() error
Ok, that xx.get() error is really just an indication that something is failing on the multiprocess -- the tracebacks from there can be a little long winded. The true error looks to be related to
ValueError: cannot reindex on an axis with duplicate labels
which likely comes from pandas -- there may be more information further up the traceback. Usually these pandas error relate to duplicate indexes (or trying to create and index that will have duplicates). Are you ok with @jtwhite79 sharing your .zip with me?
Yes, sharing that model with you is fine
looking at the mult2model_info.csv file, it looks like you are passing a list of wel files to add_pars() bc several mlt_files are being used across multiple wel list input files and those list files have different entiries for wel boundaries. for example "mult\wel_swirr_mult_inst0_grid.csv"
@jtwhite79 One big difference i found between the original version of the model (created with older version of pyemu/pandas) and this new one (created with newer versions of pyemu/pandas) is that the grid mult files for the wells look different. For example, we have multipliers by datasource, of which there are three pumping datasources (datasource is specified as a boundname in each wel file), so the original version has a mult grid file for wel_datasource that has three entries whereas the new one has a bunch of entries per datasource, although its difficult to tell how the number of entries per datasource in the new file lines up with the number of actual wells. So, the new pyemu/pandas is constructing these grid mults in a different way, it appears.
Thats an interesting observation. @briochh I dont think we have changed the way the PstFrom sets up parameters for a given set of arguments but i could be wrong. @jptraylor do you have the original model+pest files (if so can you email them to me?)?
yes i have them, will send them.
Ok after looking at those files, the number of well parameters is the same and the mult2model_info.csv is the same. The tpl files for the well multipliers are different - the new tpl file for the "datasource" tag only has two entries compared to the original.
I found this is the call where wel parameters are added and tagged with "datasource"
wel_files = sorted(glob.glob('external/wel_*[!no_ozark].dat'))
pf.add_parameters(filenames=wel_files, par_type="grid",
par_name_base=f"wel_datasource_mult", pargp="wel_datasource_mult",
upper_bound=2., lower_bound=0.5,
#ult_ubound=wel_ultimate_bounds[1],
#ult_lbound=wel_ultimate_bounds[0],
index_cols=[6], use_cols=[3], par_style="multiplier",
comment_char='#')
but when I run wel_files = sorted(glob.glob('external/wel_*[!no_ozark].dat'))
, wel_files
is empty in both the original and new datasets. Maybe there are some files missing in what you sent me?
Well it would help if i sent you the correct model, been testing things on my end, got a few versions going. Basically the one i sent was the the forward run i tried to rerun with the newer pyemu/pandas and it bombed, so it didnt write any of the org/wel*.dat files to the external/wel*.dat. I'm sending you the correct original now.
Ok Ive dug in some more. i can confirm that the new style tpl file has many more lines than the one produced by v1.2, however they both still have only 3 unique parameters, corresponding to the 3 datasource types. It looks like the new tpl file has the maximum number of entries found in any wel list file - I seem to remember going to this style broadcasting multiplier file for a reason, but it escapes me now (@briochh will probably remember). Anyhow, for me, if I roll back to v1.2 (py 3.10 and pandas 2.1), I get the same tpl as before, which ought to be what you are after...
I'm using py 3.11 right now, are you suggesting i get a 3.10 version setup with pyeemu v1.2? My python 3.11 w/ pyemu v1.2 doesnt work, it breaks.
@jtwhite79 what is the significance of the mwe.zip? Maybe i'm missing something, but thye look to be the same as the original wel files i sent
I just add those as files for @briochh in case he wanted to check it out. The tpl files look to be the same as the original if I use the version listed above...
@jptraylor, looking into this now. If I am reading it right you are trying to set up a parameter for each "type" or well and broadcast it across all well files. Is this correct?
There are a few challenges here but this should be possible. One option would be to make use of the use_rows
option in add_parameters()
. Using this it should be possible to make multiple calls to add_parameters()
using par_type='constant'
and passing the index value that you want to the use_rows
argument (e.g. `use_rows=[(iwum)]).
Unfortunately, there are a few moving parts here with changes within pandas and the subsequent refactoring of the methods in pyemu. If we look at the latest pyemu release version (1.3.3) and the latest pandas 2.1 I think you might need to make sure that if a well exists in 1 file it is also present in another. However, I have a feeling this might make your well files huge?
I am exploring how this use-case will play with later version of pyemu and pandas -- it maybe that some specific patches are required... watch this space.
@jptraylor, you could try an approach like this:
import pyemu
import os
import pandas as pd
def main():
sim_ws = "clean"
template_ws = 'test'
pf = pyemu.utils.PstFrom(original_d=sim_ws, new_d=template_ws,
remove_existing=True,
longnames=True,
zero_based=False, tpl_subfolder='tpl')
wel_files = [f for f in os.listdir(template_ws) if f.startswith("wel_") and f.endswith(".dat")]
# read in all of the well files to get the index info
fullwel = {}
headers = {}
for f in wel_files:
kper = f.split('_')[-1].split('.')[0]
with open(os.path.join(pf.new_d, f), 'r') as fp:
headers[kper] = fp.readline()
fullwel[kper] = pd.read_csv(fp, header=None, sep=r'\s+')
# concat to one big nasty multi index
fulweldf = pd.concat(fullwel, names=['kper','idx']).set_index([0,1,2,6], append=True).droplevel('idx')
# loop over the unique entries for that column 6
for tag in fulweldf.index.unique(level=6):
# one add pars across all files for each group
pf.add_parameters(filenames=wel_files, par_type="constant",
par_name_base=f"wel-datasource-mult_id:{tag}", pargp="wel-datasource-mult",
upper_bound=2., lower_bound=0.5,
#ult_ubound=wel_ultimate_bounds[1],
#ult_lbound=wel_ultimate_bounds[0],
index_cols=[0,1,2,6], use_cols=[3],
par_style="multiplier",
comment_char='#',
use_rows=fulweldf.index.droplevel('kper').unique().tolist(). # can and inclusive list (they don't have to be in the file)
)
pf.add_observations(filename=wel_files[0], index_cols=[0,1,2], use_cols=[3], ofile_sep=r'\s+')
pst = pf.build_pst()
pst.write_input_files(pf.new_d)
bd = os.getcwd()
os.chdir(pf.new_d)
try:
pyemu.helpers.apply_list_and_array_pars(chunk_len=1000)
except Exception as e:
os.chdir(bd)
raise e
os.chdir(bd)
if __name__ == "__main__":
main()
@jtwhite79 I was able to get everything running with python 3.10, pyemu 2.1 and pandas 1.5.3 (pandas 2.1 threw a deprecation error from the pyemu/helpers.py about a change in kwarg name line_terminator to lineterminator).