data-infra
data-infra copied to clipboard
Remove unmaintained `uri` field from `dim_gtfs_datasets`
As a data user, I don't want the unmaintained uri field to exist on the mart dim_gtfs_datasets table because it is deceptive and may lead to people using an outdated field. (Specifically, this happened here: https://github.com/cal-itp/data-infra/pull/3066#discussion_r1385028228.)
Acceptance criteria:
- [ ] Remove the
urifield fromdim_gtfs_datasets(should first double check if anyone is using it) - [ ] Add a reconstructed
pipeline_urlfield based on decodingbase64_urlas described in this comment or bringpipeline_urlthrough from the intermediate table (only question would be double checking whether bringingpipeline_urlthrough would affect the versioning, in which case maybe err on the side of decoding the existing column)
I'm a little confused. Both uri and pipeline_url are maintained in Airtable.
Apologies, I think maybe I got it confused -- I do think that having the templated URL with braces in the warehouse is confusing, given that pipeline_url is what is actually used for downloads and what can key to other data -- so the proposal would be to suppress uri and defer to pipeline_url. Does that seem ok? Can update ticket not to say that uri is not maintained but that it is perhaps confusing since it does not align with base64_url which is the widespread key