rustc-perf icon indicating copy to clipboard operation
rustc-perf copied to clipboard

Rename database schema items for clarity and consistency

Open wesleywiser opened this issue 4 years ago • 2 comments

I've been looking at the current database schema to try to understand how it works. I've looked at the recently added schema documentation and some of the names were difficult to decipher at a glance. For example, the column names aid and cid and the table names pstat_series and pstats. (What does the p stand for?)

I think it would help the reader understand the database schema faster if some of these things were renamed. I propose the following:

  • aid columns -> artifact_id
  • cid columns -> collection_id
  • crate columns -> benchmark_id
  • series columns -> series_id
  • benchmark.stablized -> benchmark.runs_on_stable
  • pstats table -> statistics
  • pstat_series table -> statistics_series

This creates consistency in the schema in that all columns which reference other tables' primary keys are identified via the _id suffix and it makes it easier to understand what a column is at a glance by expanding some of the single letter abbreviations.

This is relatively easy to do and the migrations to perform these renamings will run quickly since this only affects table metadata and not the content of the tables themselves.

wesleywiser avatar Jul 26 '21 14:07 wesleywiser

What does the p stand for?

process (i.e., the statistics are on a particular invocation of rustc)

I have no objections to renaming columns; migrations should essentially be automatic I think.

Mark-Simulacrum avatar Jul 26 '21 15:07 Mark-Simulacrum

A few nits:

  • statistics_series might not be the most appropriate. This table houses a description of a test case and a particular metric (i.e., the elements that describe a statistics series). The actual series data (i.e., the values for that test case and metric pair over time) is derived from this description and the data in the statistics (nee pstats) table. The name statistics_series isn't bad, but my only concern would be that it might give the impression that the series data itself lives in that table. However, I can't really think of a better alternative.
  • benchmark_id will point to the benchmark name. Some prefer that columns with id point to autoincrementing, primary key integer columns not string data. I don't feel too strongly though.

rylev avatar Jul 28 '21 12:07 rylev