domjudge icon indicating copy to clipboard operation
domjudge copied to clipboard

DOMjudge has been running a very slow SQL

Open Dup4 opened this issue 2 years ago • 9 comments
trafficstars

Description of the problem

Replace this line with a short description.

The day after a contest, we suddenly found out that DOMjudge could not be accessed.

Initially, we found that the database load was very high, using up almost all the cpu cores on the machine.

image

By troubleshooting the database's slow SQL, we found that DOMjudge had been running a SQL statement like this

SELECT DISTINCT 
j0_.judgingid AS judgingid_0, 
j0_.starttime AS starttime_1, 
j0_.endtime AS endtime_2, 
j0_.result AS result_3, 
j0_.verified AS verified_4, 
j0_.jury_member AS jury_member_5, 
j0_.verify_comment AS verify_comment_6, 
j0_.valid AS valid_7, 
j0_.output_compile AS output_compile_8, 
j0_.metadata AS metadata_9, 
j0_.seen AS seen_10, 
j0_.judge_completely AS judge_completely_11, 
j0_.uuid AS uuid_12, 
j0_.cid AS cid_13, 
j0_.submitid AS submitid_14, 
j0_.rejudgingid AS rejudgingid_15, 
j0_.prevjudgingid AS prevjudgingid_16, 
j0_.errorid AS errorid_17 
FROM judging j0_ 
LEFT JOIN rejudging r1_ ON j0_.rejudgingid = r1_.rejudgingid 
INNER JOIN judging_run j2_ ON j0_.judgingid = j2_.judgingid 
INNER JOIN judgetask j3_ ON j2_.judgetaskid = j3_.judgetaskid 
INNER JOIN judgehost j4_ ON j3_.judgehostid = j4_.judgehostid 
WHERE j4_.hostname = 'judgedaemon-802-2' 
AND j0_.judgingid = j3_.jobid 
AND j2_.runresult IS NULL 
AND (j0_.valid = 1 OR r1_.valid = 1) 
AND j0_.result <> 'compiler-error';
image

I tried running this SQL and it was very slow.

show full processlist;

By running the above command on MariaDB, we do see that a similar SQL statement is being run all the time, which I guess is the reason for the high load on the DB.

image

So what can I do about this situation.

Your environment

Include details about your installation here.

  • DOMjudge version (e.g. 7.0.0 or a github commit hash)
  • Operating system / Linux distribution and version (e.g. Ubuntu 18.04)
  • Webserver (e.g. Apache or nginx)
  • DOMjudge/8.3.0DEV/948b599af
  • Ubuntu20.04
  • Nginx

Steps to reproduce

Replace this with a description how we can reproduce your bug.

  • Step 1
  • Step 2
  • Step 3

Expected behaviour

Replace this line with what you would expect to happen.

Actual behaviour

Replace this line with what happens instead.

Any other information that you want to share?

Please include webserver, symfony and judgedaemon log snippets here as appropriate. Screenshots may help in case of UI bugs.

Dup4 avatar Mar 19 '23 09:03 Dup4

Seems to be this query: https://github.com/domjudge/domjudge/blob/main/webapp/src/Controller/API/JudgehostController.php#L176-L192

nickygerritsen avatar Mar 19 '23 09:03 nickygerritsen

Doing an explain gives:

1	SIMPLE	j4_	const	PRIMARY,hostname	hostname	258	const	1	Using index; Using temporary
1	SIMPLE	j2_	ALL	testcaseid,judgingid,IDX_29A6E6E13CBA64F2				92	Using where
1	SIMPLE	j0_	eq_ref	PRIMARY	PRIMARY	4	domjudge.j2_.judgingid	1	Using where
1	SIMPLE	r1_	eq_ref	PRIMARY	PRIMARY	4	domjudge.j0_.rejudgingid	1	Using where; Distinct
1	SIMPLE	j3_	eq_ref	PRIMARY,jobid,judgehostid,judgehostid_jobid,judgehostid_valid_priority,specific_type	PRIMARY	4	domjudge.j2_.judgetaskid	1	Using where; Distinct

Dropping the DISTINCT would give:

1	SIMPLE	j4_	const	PRIMARY,hostname	hostname	258	const	1	Using index
1	SIMPLE	j2_	ALL	testcaseid,judgingid,IDX_29A6E6E13CBA64F2				92	Using where
1	SIMPLE	j0_	eq_ref	PRIMARY	PRIMARY	4	domjudge.j2_.judgingid	1	Using where
1	SIMPLE	r1_	eq_ref	PRIMARY	PRIMARY	4	domjudge.j0_.rejudgingid	1	Using where
1	SIMPLE	j3_	eq_ref	PRIMARY,jobid,judgehostid,judgehostid_jobid,judgehostid_valid_priority,specific_type	PRIMARY	4	domjudge.j2_.judgetaskid	1	Using where

I'm wondering if we would do the distinct check in PHP whether that would make it faster. @Dup4 could you try running the query without the DISTINCT on your database and see how fast it is?

nickygerritsen avatar Mar 19 '23 09:03 nickygerritsen

Doing an explain gives:

1	SIMPLE	j4_	const	PRIMARY,hostname	hostname	258	const	1	Using index; Using temporary
1	SIMPLE	j2_	ALL	testcaseid,judgingid,IDX_29A6E6E13CBA64F2				92	Using where
1	SIMPLE	j0_	eq_ref	PRIMARY	PRIMARY	4	domjudge.j2_.judgingid	1	Using where
1	SIMPLE	r1_	eq_ref	PRIMARY	PRIMARY	4	domjudge.j0_.rejudgingid	1	Using where; Distinct
1	SIMPLE	j3_	eq_ref	PRIMARY,jobid,judgehostid,judgehostid_jobid,judgehostid_valid_priority,specific_type	PRIMARY	4	domjudge.j2_.judgetaskid	1	Using where; Distinct

Dropping the DISTINCT would give:

1	SIMPLE	j4_	const	PRIMARY,hostname	hostname	258	const	1	Using index
1	SIMPLE	j2_	ALL	testcaseid,judgingid,IDX_29A6E6E13CBA64F2				92	Using where
1	SIMPLE	j0_	eq_ref	PRIMARY	PRIMARY	4	domjudge.j2_.judgingid	1	Using where
1	SIMPLE	r1_	eq_ref	PRIMARY	PRIMARY	4	domjudge.j0_.rejudgingid	1	Using where
1	SIMPLE	j3_	eq_ref	PRIMARY,jobid,judgehostid,judgehostid_jobid,judgehostid_valid_priority,specific_type	PRIMARY	4	domjudge.j2_.judgetaskid	1	Using where

I'm wondering if we would do the distinct check in PHP whether that would make it faster. @Dup4 could you try running the query without the DISTINCT on your database and see how fast it is?

image

Dup4 avatar Mar 19 '23 09:03 Dup4

So still slow but less slow? How many judgings, rejudgings, judging_runs and judgetasks do you have?

And could you try the same query but then twice:

  • Once with only j0_.valid =1
  • Once with only r1_.valid = 1 ? So that we get rid of the OR

nickygerritsen avatar Mar 19 '23 09:03 nickygerritsen

So still slow but less slow? How many judgings, rejudgings, judging_runs and judgetasks do you have?

And could you try the same query but then twice:

  • Once with only j0_.valid =1
  • Once with only r1_.valid = 1 ? So that we get rid of the OR
image image image

Dup4 avatar Mar 19 '23 09:03 Dup4

I've checked again today and it may not be a SQL statement problem.

image

I found out through EXPLAIN that the submission table is not indexed.

image image image image

Then by SHOW INDEX I suddenly found that many tables in the database have a Cardinality of 0 in their indexes.

When I executed ANALYZE TABLE judgeing, the following SQL statement was executed very fast.

SELECT DISTINCT 
j0_.judgingid AS judgingid_0, 
j0_.starttime AS starttime_1, 
j0_.endtime AS endtime_2, 
j0_.result AS result_3, 
j0_.verified AS verified_4, 
j0_.jury_member AS jury_member_5, 
j0_.verify_comment AS verify_comment_6, 
j0_.valid AS valid_7, 
j0_.output_compile AS output_compile_8, 
j0_.metadata AS metadata_9, 
j0_.seen AS seen_10, 
j0_.judge_completely AS judge_completely_11, 
j0_.uuid AS uuid_12, 
j0_.cid AS cid_13, 
j0_.submitid AS submitid_14, 
j0_.rejudgingid AS rejudgingid_15, 
j0_.prevjudgingid AS prevjudgingid_16, 
j0_.errorid AS errorid_17 
FROM judging j0_ 
LEFT JOIN rejudging r1_ ON j0_.rejudgingid = r1_.rejudgingid 
INNER JOIN judging_run j2_ ON j0_.judgingid = j2_.judgingid 
INNER JOIN judgetask j3_ ON j2_.judgetaskid = j3_.judgetaskid 
INNER JOIN judgehost j4_ ON j3_.judgehostid = j4_.judgehostid 
WHERE j4_.hostname = 'judgedaemon-802-2' 
AND j0_.judgingid = j3_.jobid 
AND j2_.runresult IS NULL 
AND (j0_.valid = 1 OR r1_.valid = 1) 
AND j0_.result <> 'compiler-error';
image

However, some tables with a large number of rows of data have a Cardinality value.

image image

As I am not very familiar with MariaDB, I would like to ask what would cause indexes to not be built for some tables, i.e. Cardinality = 0.

Is it because there are so few rows in the table that the value of Cardinality is not generated, but some SQL statements INNER JOIN or LEFT JOIN some tables with more rows such as judging or judgetask that make the query slow?

Dup4 avatar Mar 20 '23 13:03 Dup4

@Dup4 could you somehow give us a dump of your database? Doesn't have to be public if you don't want (on Slack DM or something)

nickygerritsen avatar May 06 '23 10:05 nickygerritsen

@Dup4 could you somehow give us a dump of your database? Doesn't have to be public if you don't want (on Slack DM or something)

This problem was solved when I executed ANALYZE TABLE ${table_name}, does this database still make sense to reproduce?

If so, I can provide the dump of my database.

Dup4 avatar May 08 '23 23:05 Dup4

@Dup4 could you somehow give us a dump of your database? Doesn't have to be public if you don't want (on Slack DM or something)

This problem was solved when I executed ANALYZE TABLE ${table_name}, does this database still make sense to reproduce?

If so, I can provide the dump of my database.

Yes, if/when we fix this we want to be sure that applying the fix to your database does indeed fix your issue. Also as other people might have that issue and that way we make sure that we actually fix their database with a proper migration.

vmcj avatar Nov 01 '23 06:11 vmcj

Closing due to inactivity. Feel free to reopen if there is more info.

nickygerritsen avatar Mar 24 '24 13:03 nickygerritsen