data-infra
data-infra copied to clipboard
Come up with a way to mark URLs that are no longer functional
What is the expected behavior? Thinking specifically...
- Should they have a report for a month where this url is marked null?
- Should their latest known data be included in our latest known data dataset (the gtfs_schedule dataset)?
Currently, I would say our data views reflect "latest known feed data", so their last downloaded feed / this url are what come up when you query the latest data. We could support mechanisms for (1) marking an entire feed as deleted, e.g. this gtfs schedule no longer exists / is relevant in any way, and (2) marking URLs that aren't supported , e.g. this data still exists and this was the url it came from, but there is no current url.
Originally posted by @machow in https://github.com/cal-itp/data-infra/issues/494#issuecomment-946144413