data-infra icon indicating copy to clipboard operation
data-infra copied to clipboard

Remove unmaintained `uri` field from `dim_gtfs_datasets`

Open lauriemerrell opened this issue 2 years ago • 2 comments

As a data user, I don't want the unmaintained uri field to exist on the mart dim_gtfs_datasets table because it is deceptive and may lead to people using an outdated field. (Specifically, this happened here: https://github.com/cal-itp/data-infra/pull/3066#discussion_r1385028228.)

Acceptance criteria:

  • [ ] Remove the uri field from dim_gtfs_datasets (should first double check if anyone is using it)
  • [ ] Add a reconstructed pipeline_url field based on decoding base64_url as described in this comment or bring pipeline_url through from the intermediate table (only question would be double checking whether bringing pipeline_url through would affect the versioning, in which case maybe err on the side of decoding the existing column)

lauriemerrell avatar Nov 07 '23 14:11 lauriemerrell

I'm a little confused. Both uri and pipeline_url are maintained in Airtable.

evansiroky avatar Nov 28 '23 18:11 evansiroky

Apologies, I think maybe I got it confused -- I do think that having the templated URL with braces in the warehouse is confusing, given that pipeline_url is what is actually used for downloads and what can key to other data -- so the proposal would be to suppress uri and defer to pipeline_url. Does that seem ok? Can update ticket not to say that uri is not maintained but that it is perhaps confusing since it does not align with base64_url which is the widespread key

lauriemerrell avatar Nov 28 '23 18:11 lauriemerrell