activitysim
activitysim copied to clipboard
Check input data consistency
Idea | Level-of-effort | Notes | Priority |
---|---|---|---|
Check input data consistency | Days | Plan to check all the key relationships. Check primary key table joins across input tables - HH home zone vs. TAZ/MAZ land use file, MAZtoTAP file TAPs vs. TAP skims, TAZs in the land use file vs. TAZ skims, etc. | High |
I'm supportive of this, given that it is a configurable and expandable and customizable list of checks, similar to what ODOT has developed. I would not support a hard coded set of checks.
For reference, see what ODOT developed, https://github.com/RSGInc/SOABM/blob/master/template/inputChecker/config/inputs_checks.csv
I think we could do that. I think it will probably require the entire task budget, which I think would be ok since this is an important item. I think the solution would be both user configurable input validator using expressions + primary keys / table join checks for data structures already built into the framework. The expressions could be used to check input data consistency against what's required by the downstream submodel expressions. For example, all households have a household.type value that is acceptable, say in [1,2,3,4]. The codebook of valid input data values could be defined in a settings file such as input_validation.yaml and then made available to the input validator. The reason for hard wiring some of the checks - such as all households must be in a zone and that zone must be in the skims - is because this relationship is already assumed in the code and so it should be checked. Another way of saying this is why would a user change it? It doesn't make sense to change it given its required by the code.
Well... Just to be clear on my thumbs up - it only applies to a non-hard coded solution. I give a grumpy face thumbs down to using money on hard coding this check when it could be lumped into a flexible / configurable option.
I would / could support (thumbs-up) a flexible solution that eats the whole existing budget, but only if the rest of the team agreed. In summary, my vote is likely to skip this item and knock off some other input issues and come back to addressing this with a scope specific to setting up a flexible/configurable option.