GTFS-ride
GTFS-ride copied to clipboard
Privacy issues with rider_trip.txt, recommend removing
Problem statements
- In our opinion, the rider_trip.txt file goes against the Mobility Data Privacy Principles of which Trillium is an endorsing organization. Specifically, it violates principles # 1, # 5, # 6, and in its current state is at risk of violating principles # 3 and # 7.
- The use cases for this file are not apparent, and the ones I can think of do not justify the undo surveillance of riders’ travel patterns. In general, we would like to hear a case for this file’s inclusion that outweighs its privacy issues.
- How feasible is it to implement this feature of the spec? How would alights be recorded? How would information about a rider (e.g.
rider_type
) be generated? - MDS has similar components that deal with the collection of rider trip data. These components have caused some very public controversy resulting in a blow to the spec’s reputation. There are valuable lessons to be learned from that history. For a discussion on rider trip data generated by GBFS and MDS and the surrounding privacy concerns, see this article.
Solutions considered
-
(Recommended) Remove rider_trip.txt file from the spec altogether.
- Pros
- Solves all of the above problems.
- Simplifies the spec.
- Puts spec into a better position for adoption (removes the practical problem of implementing this file, less components to debate, less controversy)
- rider_trips.txt is not currently in use, so there’s not much work lost if we were to remove it.
- Cons
- Would lose the ability for specific rider trip analysis (but as mentioned above, this is not necessarily desirable).
- Pros
- (Also considered, but not recommended) Require a unique rider_id for both boarding and alighting, allow only board or alight for a single record, so start and end points of a single rider’s trip cannot be collected. This is similar to a solution regarding vehicle ids that GBFS implemented as a response to privacy concerns.
- Pros
- Retains the file while somewhat reducing the impact to rider privacy (would still include boarding and alighting info of a rider, but those data would be disconnected from one another)
- Cons
- Amount of data collected about riders is still superfluous.
- While more difficult, still reasonably easy to re-identify a rider because there could still be matching fields between the parsed board and alight fields (e.g.
rider_type
,transaction_type
,fare_media
, etc.) - Would lose the ability for analysis of start-to-finish individual rider trips.
- Pros
- (Also considered, but not recommended) Remove all of the alight fields, rename file to rider_boardings.txt
- Pros
- Retains the file while somewhat reducing the impact to rider privacy.
- Cons
- Amount of data collected about riders is still superfluous.
- Would lose the ability for analysis of start-to-finish individual rider trips (would only show boarding information per rider).
- Pros
Looking forward to discussing further!