activitysim icon indicating copy to clipboard operation
activitysim copied to clipboard

Feature: Standardize Trace Output Schema to Simplify Reduction Tasks

Open DavidOry opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe. Main Roads Western Australia is interested in creating a simple data visualization to increase the utility of the ActivitySim trace output to assist in debugging modifications made to the donor model they are building from. Small changes to the format of these files, specifically writing out each file to a fixed schema, would make this work much easier. The current schemas for an example nested logit simulation are as follows:

  1. base_probabilities alternative, probability

  2. choices [chooser_id], [choice] For example: household_id, auto_ownership

  3. choosers label, value

  4. eval_utils Expression, label, 0

  5. exp_utilities alternative, utility Though the first row is the chooser_id

  6. nested_probabilities alternative, probability Though the first row is the chooser_id

  7. rands [chooser_id], rand

  8. raw_utilities alternative, utility Though the first row is the chooser_id

The inconsistency in these file structures requires unnecessary and messy data manipulation to consolidate this information into a single, rational database suitable for data visualization.

Describe the solution you'd like I recommend the following standard schema for all the debug files:

trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note

Examples of what these files would look like for each of the above files are as follows:

  1. base_probabilities trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note base_probabilities, household_id, 12345, 0_CARS, NA, 0.775, NA

  2. choices trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note choices, household_id, 12345, NA, NA, 0, NA

  3. choosers trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note choosers, household_id, 12345, NA, TAZ, 123, NA

  4. eval_utils trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note eval_utils, household_id, 12345, 0_CARS, @df.num_drivers==1, -1.05, NA

  5. exp_utilities trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note exp_utilities, household_id, 12345, 0_CARS, NA, 7891.0, NA

  6. nested_probabilities trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note nested_probabilities, household_id, 12345, zero_car_nest, NA, 0.850, NA

  7. rands trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note rand, household_id, 12345, NA, NA, 0.123, NA

  8. raw_utilities trace_dimension, chooser_variable, chooser_id, alternative, trace_label, trace_value, trace_note raw_utilities, household_id, 12345, 0_CARS, NA, 2.43, NA

Describe alternatives you've considered A module could be written to consume the existing trace output and consolidate into a single, rational database that could be used with data visualization software. I think a standard schema would be better, as it would allow additional trace output files to be created and written. With the output structured in this way, straightforward concatenations, joins, and other reductions can be used to consolidate the output, as needed.

Additional context None

DavidOry avatar Jul 14 '23 20:07 DavidOry