open-bus icon indicating copy to clipboard operation
open-bus copied to clipboard

siri data should reflect changes in OriginAimedDepartureTime / siri_ride__scheduled_start_time

Open OriHoch opened this issue 2 years ago • 4 comments

reproduction steps

  • Get data for siri vehicle location (recorded_at_time_from/to=2022-04-03 10:00:08+03:00, line_ref=5164, operator_ref=3)
    • https://open-bus-stride-api.hasadna.org.il/siri_vehicle_locations/list?recorded_at_time_from=2022-04-03%2010%3A00%3A08%2B03%3A00&recorded_at_time_to=2022-04-03%2010%3A00%3A08%2B03%3A00&order_by=id%20asc&siri_routes__line_ref=5164&siri_routes__operator_ref=3
    • check the value of siri_ride__scheduled_start_time
  • Get the related siri snapshot from S3 (siri_snapshot__snapshot_id=2022/04/03/07/00)
    • download the snapshot
    • extract (brotli -d 00.br)
    • look for the relevant vehicle location record in the snapshot, can search using the geo lon value 35.221676
      • cat 00 | jq | grep -C 15 35.221676
    • Check the value of OriginAimedDepartureTime

expected

  • value of siri_ride__scheduled_start_time from DB should be the same as value of OriginAimedDepartureTime from the siri snapshot data

actual

  • value of siri_ride__scheduled_start_time from DB: 2022-04-03T04:00:00+00:00
  • value of OriginAimedDepartureTime from the siri snapshot data (converted to same timezone): 2022-04-03T03:54:00+00:00

OriHoch avatar Apr 18 '22 05:04 OriHoch

The OriginAimedDepartureTime for that ride changed. In snapshot 2022/04/03/04/54 it was 07:00 (Israel) and in later snapshots it changed to 06:54 (Israel).

Our code uses the first OriginAimedDepartureTime it encounters for a ride and uses it for all future vehicle locations of that ride, we don't keep track of changes in OriginAimedDepartureTime.

OriHoch avatar Apr 18 '22 05:04 OriHoch

@EyalBerger wrote:

I think that from analytical point of view this ride should recoded as two rides: one with OriginAimedDepartureTime of 07:00 and one of 6:54, from two reasons:

We want to allow users to load SIRI data is it in the original records. Those changes could be done by operators from different purposes, and if we want to explore their nature and patterns we need the data to be identical to SIRI source files.

OriHoch avatar Apr 18 '22 05:04 OriHoch

This greatly increases the complexity of our processes and DB structure, so will keep it in backlog for now.

Users that want to can always download the source SIRI data and see all the details. Another option is to add an API method that makes that process easier.

OriHoch avatar Apr 18 '22 05:04 OriHoch

Thanks for the summary. I think it's important to reflect the source. I suggest to discuss it with the team in the next internal meeting to get a better understanding of how to prioritize this taking into account the complexity here.

EyalBerger avatar Apr 21 '22 09:04 EyalBerger