populationsim icon indicating copy to clipboard operation
populationsim copied to clipboard

KeyError: zone not in index

Open 4Step opened this issue 1 year ago • 3 comments

Describe the bug The program bug arises with Group Quarters model where the control file is set at zonal level and the land-use data contains GQ data (Group_quarters_pop_noninstitutionalized) for only few zones (like 25% of all zones). The PUMS to TAZ crosswalk file includes all zones, even the ones with no GQ data. As the PopulationSIM runs it loops over each PUMA and selects zones to process. Somewhere here is the bug it crashes with the following error: KeyError: u'the label [8619] is not in the [index]' Closing remaining open files:C:\TSM_NextGen_v5\PopSim\Florida\Setup\output\GQ\pipeline.h5...done

Work around Use a separate crosswalk file for GQ with only zones and PUMA for which GQ data exist. However, this is annoying as each model year could have a different GQ set of zones.

The log file prints the following details

INFO - initial_seed_balancing seed id 8619 Traceback (most recent call last): File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\run_populationsim.py", line 62, in pipeline.run(models=steps, resume_after=resume_after) File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\activitysim\core\pipeline.py", line 571, in run run_model(model) File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\activitysim\core\pipeline.py", line 472, in run_model orca.run([step_name]) File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\orca\orca.py", line 1992, in run step() File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\orca\orca.py", line 797, in call return self._func(**kwargs) File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\populationsim\steps\initial_seed_balancing.py", line 82, in initial_seed_balancing control_totals=seed_controls_df.loc[seed_id], File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\pandas\core\indexing.py", line 1478, in getitem return self._getitem_axis(maybe_callable, axis=axis) File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\pandas\core\indexing.py", line 1911, in _getitem_axis self._validate_key(key, axis) File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\pandas\core\indexing.py", line 1798, in _validate_key error() File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\pandas\core\indexing.py", line 1785, in error axis=self.obj._get_axis_name(axis)))

To Reproduce Steps to reproduce the behavior:

  1. Well, select a few random zones and remove GQ data from the field referred in controls.csv
  2. Run populationSIM and it should produce the out of index error
  3. The reported zone number is one after the actual zone with no GQ data.

Expected behavior Skip the zones with no GQ data.

Screenshots image

Additional context Supplying a separate crosswalk file with only the list of TAZ with GQ data works fine. This might have to do with the looping and processing of the crosswalk file. A temporary internal crosswalk might help where only the zones with GQ controls could be used.

4Step avatar Dec 06 '24 14:12 4Step

Shouldn't this be posted on the popsim issues page? For context, Oregon always runs GQ popsim different from a general population popsim run (and then stiches them back together). This is done because most of the controls only related to the general population, so it creates an internal fight within the popsim balancing to try and complete both GP and GQ in the same simulation/run.

bettinardi avatar Dec 06 '24 14:12 bettinardi

@4Step it might be helpful if you could post the data somewhere so we can replicate the error you are receiving. Note that I transferred this issue from ActivitySim to PopulationSim.

jfdman avatar Dec 06 '24 19:12 jfdman

Shouldn't this be posted on the popsim issues page? For context, Oregon always runs GQ popsim different from a general population popsim run (and then stiches them back together). This is done because most of the controls only related to the general population, so it creates an internal fight within the popsim balancing to try and complete both GP and GQ in the same simulation/run.

@bettinardi , Our implementation is similar to Oregon where we run general population and GQ population separately then combine both. Just to clarify, the issue I posted is related to GQ application where it requires a second crosswalk file with a list of zones that consists non-zero GQ (the general population crosswalk file results in the error I posted).

4Step avatar Dec 09 '24 13:12 4Step