urbansim_templates icon indicating copy to clipboard operation
urbansim_templates copied to clipboard

Better reporting and diagnostics

Open smmaurer opened this issue 6 years ago • 1 comments

Better reporting about missing values for MNL --

When data tables have missing values, those rows are automatically filtered out (I think by Patsy) before models are estimated or predicted values are calculated.

We should provide clearer reporting about this, so that users understand what's going on and what the scope of the missing data is.

smmaurer avatar Jul 03 '18 00:07 smmaurer

We should have better reporting of missing values for network aggregations as well.

Currently, the status messages look like this:

Computing pop_10000
Removed 191599 rows because they contain missing values

Where is this coming from?

For each aggregation that’s calculated, urbansim.utils.networks.from_yaml() gets a copy of the dataframe whose values are being aggregated (for example the buildings table) and runs a couple of pandana operations on it. The dataframe includes the node id column, the column being aggregated, and any other columns referenced in the aggregation instructions, e.g. filters. https://github.com/UDST/urbansim/blob/master/urbansim/utils/networks.py#L52-L55

pandana.Network.set() runs df.dropna() on the dataframe, and reports the number of rows removed. It looks like this would include missing values in the node id column, the column being aggregated, the filter columns, etc. https://github.com/UDST/pandana/blob/master/pandana/network.py#L227-L236

More details would be helpful for catching data problems.

smmaurer avatar Jul 16 '18 20:07 smmaurer