open-bus
open-bus copied to clipboard
siri data should reflect changes in OriginAimedDepartureTime / siri_ride__scheduled_start_time
reproduction steps
- Get data for siri vehicle location (recorded_at_time_from/to=
2022-04-03 10:00:08+03:00
, line_ref=5164
, operator_ref=3
)- https://open-bus-stride-api.hasadna.org.il/siri_vehicle_locations/list?recorded_at_time_from=2022-04-03%2010%3A00%3A08%2B03%3A00&recorded_at_time_to=2022-04-03%2010%3A00%3A08%2B03%3A00&order_by=id%20asc&siri_routes__line_ref=5164&siri_routes__operator_ref=3
- check the value of
siri_ride__scheduled_start_time
- Get the related siri snapshot from S3 (siri_snapshot__snapshot_id=
2022/04/03/07/00
)- download the snapshot
- extract (
brotli -d 00.br
) - look for the relevant vehicle location record in the snapshot, can search using the geo lon value
35.221676
-
cat 00 | jq | grep -C 15 35.221676
-
- Check the value of
OriginAimedDepartureTime
expected
- value of
siri_ride__scheduled_start_time
from DB should be the same as value ofOriginAimedDepartureTime
from the siri snapshot data
actual
- value of
siri_ride__scheduled_start_time
from DB:2022-04-03T04:00:00+00:00
- value of
OriginAimedDepartureTime
from the siri snapshot data (converted to same timezone):2022-04-03T03:54:00+00:00
The OriginAimedDepartureTime for that ride changed. In snapshot 2022/04/03/04/54 it was 07:00 (Israel) and in later snapshots it changed to 06:54 (Israel).
Our code uses the first OriginAimedDepartureTime it encounters for a ride and uses it for all future vehicle locations of that ride, we don't keep track of changes in OriginAimedDepartureTime.
@EyalBerger wrote:
I think that from analytical point of view this ride should recoded as two rides: one with OriginAimedDepartureTime of 07:00 and one of 6:54, from two reasons:
We want to allow users to load SIRI data is it in the original records. Those changes could be done by operators from different purposes, and if we want to explore their nature and patterns we need the data to be identical to SIRI source files.
This greatly increases the complexity of our processes and DB structure, so will keep it in backlog for now.
Users that want to can always download the source SIRI data and see all the details. Another option is to add an API method that makes that process easier.
Thanks for the summary. I think it's important to reflect the source. I suggest to discuss it with the team in the next internal meeting to get a better understanding of how to prioritize this taking into account the complexity here.