urbansim_templates
urbansim_templates copied to clipboard
Better reporting and diagnostics
Better reporting about missing values for MNL --
When data tables have missing values, those rows are automatically filtered out (I think by Patsy) before models are estimated or predicted values are calculated.
We should provide clearer reporting about this, so that users understand what's going on and what the scope of the missing data is.
We should have better reporting of missing values for network aggregations as well.
Currently, the status messages look like this:
Computing pop_10000
Removed 191599 rows because they contain missing values
Where is this coming from?
For each aggregation that’s calculated, urbansim.utils.networks.from_yaml()
gets a copy of the dataframe whose values are being aggregated (for example the buildings table) and runs a couple of pandana
operations on it. The dataframe includes the node id column, the column being aggregated, and any other columns referenced in the aggregation instructions, e.g. filters. https://github.com/UDST/urbansim/blob/master/urbansim/utils/networks.py#L52-L55
pandana.Network.set()
runs df.dropna()
on the dataframe, and reports the number of rows removed. It looks like this would include missing values in the node id column, the column being aggregated, the filter columns, etc. https://github.com/UDST/pandana/blob/master/pandana/network.py#L227-L236
More details would be helpful for catching data problems.