online-judge icon indicating copy to clipboard operation
online-judge copied to clipboard

Sort submissions by date instead of ID; fixes #1526

Open xiaowuc1 opened this issue 4 years ago • 1 comments

xiaowuc1 avatar Sep 05 '20 00:09 xiaowuc1

Codecov Report

Merging #1527 into master will not change coverage. The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1527   +/-   ##
=======================================
  Coverage   46.05%   46.05%           
=======================================
  Files         209      209           
  Lines       12059    12059           
=======================================
  Hits         5554     5554           
  Misses       6505     6505           
Impacted Files Coverage Δ
judge/views/submission.py 32.91% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 137d598...038e90f. Read the comment docs.

codecov-commenter avatar Sep 05 '20 00:09 codecov-commenter

Closing because while nice, this change makes /submissions dramatically slower when the submissions table doesn't fit in RAM :disappointed:

Xyene avatar Oct 31 '22 00:10 Xyene

Of course this is way slower because we have to sort.

But... what if we didn't have to sort? What if we changed how the data is stored on the database?

The first thought is to use ALTER TABLE ... ORDER BY. This would be an ideal usage for us, realistically, since we only want this reordering done once. Alas, MySQL doesn't work like that. Under InnoDB, records are always sorted by the clustered index, which in our case, is our primary key.

Ok, what if we do that?

BEGIN;
ALTER TABLE judge_submission
  MODIFY id INT,
  ADD KEY (id),
  DROP PRIMARY KEY,
  ADD PRIMARY KEY (date, id);
SET FOREIGN_KEY_CHECKS = 0;
ALTER TABLE judge_submission
  MODIFY id INT AUTO_INCREMENT;
SET FOREIGN_KEY_CHECKS = 1;
COMMIT;

Some notes about the query:

  • Locally, MySQL complains if I combine all these alters into one statement, claiming that the id column doesn't exist. The above version works fine, and I hope that the first alter is expensive and the second is cheap (so combining wouldn't help that much).
  • Foreign key checks have to be turned off because MySQL won't let us change the datatype of a column that is referred to by other tables. Since we are only adding auto increment to the field, this will cause no issues.
  • We add a key to (id). This is both necessary and a good idea. It's necessary because MySQL requires auto-increment columns to have indexes.

Maybe this isn't the best option. In particular, changing the primary key might have important ramifications for other parts of the database.

We can consider rebuilding the table as well, but this comes with some considerations:

  • Changing the IDs of submissions breaks hard links, that surely exist in many places.
  • The actual execution of this is challenging - the foreign keys make it challenging to do this.

Initially, I was going to propose this method as well, but it genuinely is quite annoying, so I gave up. Also, I think the first point is a serious consideration.

Riolku avatar Nov 01 '22 15:11 Riolku

You can change the clustered index, but at what cost? Loading submissions by ID is a very common use case, and that will suddenly be more expensive.

What do we actually gain? Less annoying ordering many pages down in the submission list? Seems not worth.

quantum5 avatar Nov 01 '22 22:11 quantum5

We still have an index on id. What operation will get seriously slower?

Riolku avatar Nov 01 '22 22:11 Riolku