matsim-libs
matsim-libs copied to clipboard
Why are trip times converted to HH:MM:SS format?
I was wondering if there is a specific reason to convert all trip times (departure, waiting, travel) to a HH:MM:SS format when writing a trip csv file? This becomes annoying, especially in cases where our simulation time goes over 24 hours, to read in R/Python. Not unmanageable just more complicated then if the value was just seconds after midnight. Then the user can format that, however, they want. But maybe I am missing something.
https://github.com/matsim-org/matsim-libs/blob/b3da8269369d10d7ef1dc92fc1906ca3bee2ad13/matsim/src/main/java/org/matsim/analysis/TripsAndLegsCSVWriter.java#L191
I think this was because the use case of trips.csv was not fully sure at the time. We assumed it is more about looking at the trips.csv directly and used HH:MM:SS like plans.xml does. HH:MM:SS is much better for human readability. Now it seems people rather use it as an input for further analysis in R/Python and actually use the field departureTime for something which needs interpretation as a time object rather than carrying on as character vector. Then seconds only is easier and possibly faster to be read in. I also once thought about switching to seconds. From my side, if there is nobody unhappy with seconds only, go ahead.
I guess it is a case of perspective, @balacmi, as to what is annoying ;-)
I/we too use the trips as a quick reference to see if there are, for example, excessively long walking trips. Then 03:25:15 (as a travel time) is much easier to interpret (quickly) than 12 315s. Consequently, I would support the current format. But I am not against using seconds. It just means the *trips.csv.gz
file becomes quite useless (for us) to eye-ball results and we will first have to read/parse/convert it in R to make sense of it.
Yes, I agree with you @JWJoubert that it is a matter of perspective. On the other hand, I am not sure how much easier it is to spot 03:25:15 than a large number like 12315. I can only see a reason to go through the trips file manually if the file is small in size. In that case if one wants to have a visual inspection they can easily load this in Excel and sort values to look for outliers. In case of large scale simulations (and this is what MATSim is for, right) I would think that R/Python inspections are definitely preferred. We could also add additional columns that have these values formatted. However this can lead to files that are too large. Therefore, I would still opt for simple seconds representation.