augur
augur copied to clipboard
Refactor dates and times in the Commits table
(1) The dates in augur_data.commits have been stored in character varying fields. This forces coversion to and from date types every time they're used, and could lead to code errors:
Fields in question:
cmt_author_date │ character varying(10)
cmt_committer_date │ character varying
(2) Of the other datetimes in the table, half are timestamp with time zone, and half are timestamp without time zone. These are certainly already leading to query errors:
cmt_date_attempted │ timestamp(0) without time zone
cmt_ght_committed_at │ timestamp(0) without time zone
cmt_committer_timestamp │ timestamp(0) with time zone
cmt_author_timestamp │ timestamp(0) with time zone
data_collection_date │ timestamp(0) without time zone
Was there a reason this was done, or was it completely a design error?
We refactored brianwarner/facade into Augur for commits (with permission). It came that way. It can be fixed. We've changed other things and kept it updated. I'll ask Brian the origins of the choice. I suspect it emerges from git version inconsistencies that were more of an issue in the before times and less of an issue today, since all that data comes from reading the git log.
The ones with time zone are additions based on an earlier collaboration. We did that. Adding a column seemed less risky to me than changing one. It allowed for a long period of sustained validation. It's toenail fungus we can clean up now.
We should be able to just change the type for data_collection_date, though, no? I mean, we know that's going to be in the system timezone.
@jberkus : The data_collection_date is just when we got the data. Are you thinking of using that as some kind of offset?
Yes, but it's TIMESTAMP WITHOUT TIME ZONE. Nothing in the system should be WITHOUT TIME ZONE, and that one seems like an easy fix -- which TZ it is doesn't require guesswork.