activitysim icon indicating copy to clipboard operation
activitysim copied to clipboard

Duplicate column names within parking location choice model

Open jpn-- opened this issue 2 years ago • 0 comments

The parking location choice model is making two passes through the logit.interaction_dataset function.

The first is here: https://github.com/ActivitySim/activitysim/blob/1e4ffc5cdd745c183b2b50385e457c8cfaa6c18d/activitysim/abm/models/parking_location_choice.py#L169

The second is embedded in the interaction_sample_simulate function called here: https://github.com/ActivitySim/activitysim/blob/1e4ffc5cdd745c183b2b50385e457c8cfaa6c18d/activitysim/abm/models/parking_location_choice.py#L126

The logit.interaction_dataset function is hard-coded to add a "_chooser" suffix as needed the chooser side of the interaction merge, to disambiguate column names when there are duplicates. However, this can causes clashes when the duplicate is found twice, as then there are duplicate columns named "x_chooser".

After a cursory review and testing, I believe these columns that are created with duplicate names actually contain duplicate data, but the flow of data in the parking location model is complicated and an addition review is warranted (@i-am-sijia since you've recently been working on fixing this model you may be the best positioned to check this). If the data is indeed systematically duplicated, for memory efficiency we should avoid creating the duplicate columns. If not, this is a serious problem that should be fixed. A more careful review (and testing regime) is needed to confirm for former is sufficient.

jpn-- avatar Jan 02 '23 17:01 jpn--