activitysim
activitysim copied to clipboard
Indiscriminate conversion of string fields to categorical is problematic
Describe the bug Most but not all fields initially encoded as strings are actually categorical. When they are categorical, conversion to an explicit categorical type is efficient. However, if they are not categorical (e.g. escort tour participants) or are loosely categorical but with potentially a lot of categories (vehicle type / age / fuel), the conversion to explicit categorical is not efficient.
In particular, converting non-categorical data to categorical ruins sharrow performance by triggering excessive recompiling, because every different categorical encoding is treated as a unique data type. This means, for example, if a "categorical" escort tour participants data column appears in a chooser table, then re-compiling will happen basically every time the model runs.
A fix will require not converting these fields to categorical data types.
This is quite possibly the problem in #756.