📄🚀 Accommodate other types of instantaneous events
Describe the feature you want and how it meets your needs or solves a problem
As a software developer who works with a range of data sets from various transit agencies and ITS vendors, I’d like the TIDES data model to handle instantaneous events that don’t clearly fit into any of the existing tables.
TIDES currently has three tables for storing instantaneous events:
-
fare_transactions, for AFC events such as fare validations or ticket purchases. -
vehicle_locations, for location snapshots of transit vehicles (a.k.a. “heartbeats” or “breadcrumbs”). -
passenger_events, for various events that occur during astop_visitbut that aren't fare transactions (such as APC activity or bike rack deployments).
Putting aside the names of these tables for a moment and looking at them more abstractly, each contains instantaneous events with the following differences:
-
fare_transactionsrelates to an AFC device and optionally a vehicle, records no location information, and has several fields specific to fare transactions. -
vehicle_locationsrelates to a vehicle and contains spatial coordinates and/or odometer information. -
passenger_eventsrelates to astop_visit(indirectly, by avheclia and time and/or stop sequence), a type of event (from an enum), and records no location information because it can be gleaned from the associatedstop_visit.
These tables capture some important types of events, but there are other useful instantaneous events that don’t clearly fit into any of these tables. For example:
- Passenger-related events that don’t entail fare transactions or vehicles, such as gate or turnstile entries or exits.
- Vehicle-related events that are recorded by stationary equipment and aren’t necessarily relatable to a vehicle ID, such as a track circuit or signal block recording the passage of a train. (Such systems often reference some internal, temporary train identifier that differs from the persistent railcar identifiers).
- Onboard passenger-related events that occur between
stop_visitsand therefore have meaningful location information. For example, a passenger requesting a stop or the AVL system announcing the name of an upcoming stop (both of which can be ingredients for route, pattern, or stop-visit inference when that information isn't directly provided in the data set). - Log entries from various systems, such as a driver logging into an AVL system, a station gate being rebooted, or a station escalator turning on or reversing direction.
Describe the solution you'd like
I’d like some way of accommodating the record types mentioned above, and more generally, other types of instantaneous transit records that other users may require now or in the future. I think this could be achieved by redefining or modifying the vehicle_locations and passenger_events tables. (I think fare_transactions should remain relatively unchanged, as it has several fields suited to a very specific kind of event)
Describe alternatives you've considered
1. Replace vehicle_locations and passenger_events with two somewhat more flexible tables: one for events generated by vehicles or vehicle-mounted devices and the other for events generated by stationary devices.
- Pro: This accommodates record types that aren’t currently accommodated.
-
Cons:
- Segregating records by vehicle-mounted vs stationary devices will force some data sources to be split into two tables. For example, logs from an AFC system would need to be split so that farebox records go to one table and vending machine and station-gate records go to another.
- This ignores mobile, non-vehicle mounted devices such as a handheld fare validator used by a fare inspector who travels on different vehicles throughout a shift. This could be done with a third, optional table (for truly mobile as opposed to vehicle-mounted devices), or the vehicle ID field could simply be made optional and these devices could record lat/lon if given.
2. Combine vehicle_locations and passenger_events into a single table that has fields for lat/lon, odometer, and references to vehicles and/or places, and name it something more generic.
-
Pros:
- This avoids having to split an input data source into two tables (see the first bullet in solution 1), and can be very flexible for accommodating future record types and data sources.
- Some combined AVL/APC systems already provide data in this way, as a stream of vehicle, door, and passenger events that all record the current lat, lon, and odometer.
-
Con: It would be more onerous to validate and enforce referential integrity, as different record types should ideally require different fields to be populated. For example, records that currently reside in the
vehicle_locationtable should ideally require avehicle_idwhile a station-gate exit record should leave that field null but might require a device ID or location ID.
3. Keep vehicle_locations, add lat/lon/odom fields to passenger_events so that it can capture events between stop visits, and add one or more additional tables to capture the other types of records mentioned above.
- Pro: Tables are more clearly suited to certain types of events, and less filtering would be required to extract certain record types from a table of many types.
-
Con: This may require adding new tables as TIDES users seek to capture other types of instantaneous record types, many of which will be structurally quite similar (a device and/or vehicle ID, a timestamp, some event type or message). This could be mitigated by adding one very generic table as a catch-all for all instantaneous events that don't fit into
vehicle_locationsorpassenger_events.
Combine vehicle_locations and passenger_events into a single table that has fields...
This idea is most flexible and is closely related to "event stores", not uncommon in event-driven architectures (Event Store background). There are many variations of this idea but knowing that TIDES uses snapshotted data (non-streaming), often single tenant compiled from multiple data sources (AVL, AFC, APC, etc.), the use of data containers inside the table allow the combination the (mostly) mutually exclusive datasets into a single table (think JSON/JSONB inside single column). Now some things would be shared, such as Timestamp, [GU]ID, Revision, and perhaps some meta-data in some cases. To make this approach work, there needs to be a frictionless Data Package profile defined for each event type.
Pro:
- This can accommodate all types of events into the future.
- Technology already exists to support it.
Cons:
- More complexity in reading from and writing to TIDES-Events.
- New Data Package profiles are needed to be written by community for each type of Event.
- Contained events are not normalized for OLTP databases.
- Larger file sizes.
I am reluctant to generalize the existing vehicle_locations and passenger_events tables given they are so central to expected use of this data.
My proposal would be to plan for, but not yet define, three new tables in the spec
vehicle_events which would be a generalized structure for events that occur on vehicles
location_events which would be a generalized structure events that occur at static locations, and
locations which would be a reference table for location_events similar to devices and vehicles
I could see evolution (in the future) of the vehicle_locations, passenger_events, and fare_transactions as derived data from the more generalized vehicle and location event data.
I suggest we add an "Extensions" page to the documentation. It should contain recommendations or best practices, derived from the principles we've used to create this spec, e.g., use informative field names that aren't reserved SQL keywords, re-use field names from other parts of the spec and link new tables to existing tables with the keys.
I like the Extensions page idea a lot: encouraging folk to go beyond what's documented but also indicating some suggested constraints on that activity, to flexibly allow development towards the types of future tables that @jlstpaul indicated in the comment above.
Shall we remove the v1 tag from this issue?
Added "Extending the TIDES specification" section to draft Best Practices document that will eventually be at docs/best-practices.md