facade icon indicating copy to clipboard operation
facade copied to clipboard

Divide analysis_data table, create new commits table

Open brianwarner opened this issue 6 years ago • 0 comments

Per discussion with @sgoggins, this issue proposes some major structural changes to the way commit data is stored. It was triggered by discussion around #33

For a while I have wanted to optimize analysis_data. Each row in the table contains info on each file that changed in a commit. Each row also contains its own copy of author and committer info. When a commit changes a single file, it's not really a big deal. But when a commit changes a lot of files, there's a lot of duplication in the metadata.

There is some benefit in breaking this info out into a separate table, called commits. It would reduce the overall size of analysis_data (I haven't run into issues with this yet, but I'm not using it at the same scale as Sean, see #31 ). It would also yield a graceful solution for #33 by providing us the ability to start over, storing dates as a native DATETIME rather than in ISO 8601 format as a VARCHAR.

In addition, it also gives us a new central place to store the commit message, which may be useful info.

The main changes required are:

  • Alter setup.py to move these columns out of analysis_data and into a new commits table
  • Add a clause to the function update_db in facade-worker.py to add the new commits table, copy over commit and author/committer info, remove old columns from analysis_data and optimize it, and then do a cursory walk through the git log of each repo to get full datetime info for authors/committers plus commit messages.
  • Update the caching functions with the new join between analysis_data and commits
  • Add the ability to view commit messages to various UIs
  • Cut a new major release, because this is a significant database change

While this is a big change, in theory it should be possible to do all of the changes transparently to a user with an existing database. The first facade-worker.py run after pulling this code will take longer than usual, but that's likely the only impact.

brianwarner avatar Apr 01 '19 17:04 brianwarner