activitysim
activitysim copied to clipboard
Feature: Standardize Trace Output Schema to Simplify Reduction Tasks
Is your feature request related to a problem? Please describe. Main Roads Western Australia is interested in creating a simple data visualization to increase the utility of the ActivitySim trace output to assist in debugging modifications made to the donor model they are building from. Small changes to the format of these files, specifically writing out each file to a fixed schema, would make this work much easier. The current schemas for an example nested logit simulation are as follows:
-
base_probabilities
alternative,probability -
choices [
chooser_id], [choice] For example:household_id,auto_ownership -
choosers
label,value -
eval_utils
Expression,label,0 -
exp_utilities
alternative,utilityThough the first row is the chooser_id -
nested_probabilities
alternative,probabilityThough the first row is the chooser_id -
rands [
chooser_id],rand -
raw_utilities
alternative,utilityThough the first row is the chooser_id
The inconsistency in these file structures requires unnecessary and messy data manipulation to consolidate this information into a single, rational database suitable for data visualization.
Describe the solution you'd like I recommend the following standard schema for all the debug files:
trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note
Examples of what these files would look like for each of the above files are as follows:
-
base_probabilities
trace_dimension,chooser_variable,chooser_id,alternative,trace_label,trace_value,trace_notebase_probabilities,household_id,12345,0_CARS,NA,0.775,NA -
choices
trace_dimension,chooser_variable,chooser_id,alternative,trace_label,trace_value,trace_notechoices,household_id,12345,NA,NA,0,NA -
choosers
trace_dimension,chooser_variable,chooser_id,alternative,trace_label,trace_value,trace_notechoosers,household_id,12345,NA,TAZ,123,NA -
eval_utils
trace_dimension,chooser_variable,chooser_id,alternative,trace_label,trace_value,trace_noteeval_utils,household_id,12345,0_CARS,@df.num_drivers==1,-1.05,NA -
exp_utilities
trace_dimension,chooser_variable,chooser_id,alternative,trace_label,trace_value,trace_noteexp_utilities,household_id,12345,0_CARS,NA,7891.0,NA -
nested_probabilities
trace_dimension,chooser_variable,chooser_id,alternative,trace_label,trace_value,trace_notenested_probabilities,household_id,12345,zero_car_nest,NA,0.850,NA -
rands
trace_dimension,chooser_variable,chooser_id,alternative,trace_label,trace_value,trace_noterand,household_id,12345,NA,NA,0.123,NA -
raw_utilities
trace_dimension,chooser_variable,chooser_id,alternative,trace_label,trace_value,trace_noteraw_utilities,household_id,12345,0_CARS,NA,2.43,NA
Describe alternatives you've considered A module could be written to consume the existing trace output and consolidate into a single, rational database that could be used with data visualization software. I think a standard schema would be better, as it would allow additional trace output files to be created and written. With the output structured in this way, straightforward concatenations, joins, and other reductions can be used to consolidate the output, as needed.
Additional context None