activitysim
activitysim copied to clipboard
Feature: Standardize Trace Output Schema to Simplify Reduction Tasks
Is your feature request related to a problem? Please describe. Main Roads Western Australia is interested in creating a simple data visualization to increase the utility of the ActivitySim trace output to assist in debugging modifications made to the donor model they are building from. Small changes to the format of these files, specifically writing out each file to a fixed schema, would make this work much easier. The current schemas for an example nested logit simulation are as follows:
-
base_probabilities
alternative
,probability
-
choices [
chooser_id
], [choice
] For example:household_id
,auto_ownership
-
choosers
label
,value
-
eval_utils
Expression
,label
,0
-
exp_utilities
alternative
,utility
Though the first row is the chooser_id -
nested_probabilities
alternative
,probability
Though the first row is the chooser_id -
rands [
chooser_id
],rand
-
raw_utilities
alternative
,utility
Though the first row is the chooser_id
The inconsistency in these file structures requires unnecessary and messy data manipulation to consolidate this information into a single, rational database suitable for data visualization.
Describe the solution you'd like I recommend the following standard schema for all the debug files:
trace_dimension
, chooser_variable
, chooser_id
, alternative
, trace_label
, trace_value
, trace_note
Examples of what these files would look like for each of the above files are as follows:
-
base_probabilities
trace_dimension
,chooser_variable
,chooser_id
,alternative
,trace_label
,trace_value
,trace_note
base_probabilities
,household_id
,12345
,0_CARS
,NA
,0.775
,NA
-
choices
trace_dimension
,chooser_variable
,chooser_id
,alternative
,trace_label
,trace_value
,trace_note
choices
,household_id
,12345
,NA
,NA
,0
,NA
-
choosers
trace_dimension
,chooser_variable
,chooser_id
,alternative
,trace_label
,trace_value
,trace_note
choosers
,household_id
,12345
,NA
,TAZ
,123
,NA
-
eval_utils
trace_dimension
,chooser_variable
,chooser_id
,alternative
,trace_label
,trace_value
,trace_note
eval_utils
,household_id
,12345
,0_CARS
,@df.num_drivers==1
,-1.05
,NA
-
exp_utilities
trace_dimension
,chooser_variable
,chooser_id
,alternative
,trace_label
,trace_value
,trace_note
exp_utilities
,household_id
,12345
,0_CARS
,NA
,7891.0
,NA
-
nested_probabilities
trace_dimension
,chooser_variable
,chooser_id
,alternative
,trace_label
,trace_value
,trace_note
nested_probabilities
,household_id
,12345
,zero_car_nest
,NA
,0.850
,NA
-
rands
trace_dimension
,chooser_variable
,chooser_id
,alternative
,trace_label
,trace_value
,trace_note
rand
,household_id
,12345
,NA
,NA
,0.123
,NA
-
raw_utilities
trace_dimension
,chooser_variable
,chooser_id
,alternative
,trace_label
,trace_value
,trace_note
raw_utilities
,household_id
,12345
,0_CARS
,NA
,2.43
,NA
Describe alternatives you've considered A module could be written to consume the existing trace output and consolidate into a single, rational database that could be used with data visualization software. I think a standard schema would be better, as it would allow additional trace output files to be created and written. With the output structured in this way, straightforward concatenations, joins, and other reductions can be used to consolidate the output, as needed.
Additional context None