GTFS-ride icon indicating copy to clipboard operation
GTFS-ride copied to clipboard

Privacy issues with rider_trip.txt, recommend removing

Open westontrillium opened this issue 2 years ago • 1 comments

Problem statements

  1. In our opinion, the rider_trip.txt file goes against the Mobility Data Privacy Principles of which Trillium is an endorsing organization. Specifically, it violates principles # 1, # 5, # 6, and in its current state is at risk of violating principles # 3 and # 7.
  2. The use cases for this file are not apparent, and the ones I can think of do not justify the undo surveillance of riders’ travel patterns. In general, we would like to hear a case for this file’s inclusion that outweighs its privacy issues.
  3. How feasible is it to implement this feature of the spec? How would alights be recorded? How would information about a rider (e.g. rider_type) be generated?
  4. MDS has similar components that deal with the collection of rider trip data. These components have caused some very public controversy resulting in a blow to the spec’s reputation. There are valuable lessons to be learned from that history. For a discussion on rider trip data generated by GBFS and MDS and the surrounding privacy concerns, see this article.

Solutions considered

  • (Recommended) Remove rider_trip.txt file from the spec altogether.
    • Pros
      • Solves all of the above problems.
      • Simplifies the spec.
      • Puts spec into a better position for adoption (removes the practical problem of implementing this file, less components to debate, less controversy)
      • rider_trips.txt is not currently in use, so there’s not much work lost if we were to remove it.
    • Cons
      • Would lose the ability for specific rider trip analysis (but as mentioned above, this is not necessarily desirable).
  • (Also considered, but not recommended) Require a unique rider_id for both boarding and alighting, allow only board or alight for a single record, so start and end points of a single rider’s trip cannot be collected. This is similar to a solution regarding vehicle ids that GBFS implemented as a response to privacy concerns.
    • Pros
      • Retains the file while somewhat reducing the impact to rider privacy (would still include boarding and alighting info of a rider, but those data would be disconnected from one another)
    • Cons
      • Amount of data collected about riders is still superfluous.
      • While more difficult, still reasonably easy to re-identify a rider because there could still be matching fields between the parsed board and alight fields (e.g. rider_type, transaction_type, fare_media, etc.)
      • Would lose the ability for analysis of start-to-finish individual rider trips.
  • (Also considered, but not recommended) Remove all of the alight fields, rename file to rider_boardings.txt
    • Pros
      • Retains the file while somewhat reducing the impact to rider privacy.
    • Cons
      • Amount of data collected about riders is still superfluous.
      • Would lose the ability for analysis of start-to-finish individual rider trips (would only show boarding information per rider).

Looking forward to discussing further!

westontrillium avatar May 12 '22 00:05 westontrillium