gtfs-editor icon indicating copy to clipboard operation
gtfs-editor copied to clipboard

Improve pattern inference

Open mattwigway opened this issue 10 years ago • 1 comments

The whole idea behind patterns is that they are a generalization of a set of trips. If we have a separate pattern for a school tripper (which makes an extra stop at a school at bell times), that kind of defeats the purpose. However, the current GTFS importer makes a pattern for each unique stop sequence. We can improve this by merging similar patterns. We need only a similarity metric; probably Levenshtein distance or Damerau-Levenshtein distance would be appropriate. Regarding their relative merits, I would lean towards the former, because it seems intuitively that trips ABCD and ACBD are more different than ABCD and ABD (A, B... are stops).

It may make sense to scale the distance by the length of the trip (perhaps the average length of the trips being compared)? A single insertion in a three-stop trip is more significant than in a twenty-stop trip.

mattwigway avatar Oct 29 '14 13:10 mattwigway

Once we upgrade to the new GTFS loader, we can use its pattern detection algorithms to find all of the unique stop sequences, and then only calculate per-pattern distances.

mattwigway avatar Oct 30 '14 03:10 mattwigway