solid_queue icon indicating copy to clipboard operation
solid_queue copied to clipboard

Possibility of making `SolidQueue::Job.clear_finished_in_batches` more performant?

Open salmonsteak1 opened this issue 1 year ago • 5 comments

Hey there, I was analyzing some queries that is being done on my solid queue DB and I came across this:

DELETE
FROM
  "solid_queue_jobs"
WHERE
  "solid_queue_jobs"."id" IN (
  SELECT
    "solid_queue_jobs"."id"
  FROM
    "solid_queue_jobs"
  WHERE
    "solid_queue_jobs"."finished_at" IS NOT NULL
    AND "solid_queue_jobs"."finished_at" < $1
  LIMIT
    $2)

I believe it relates to the SolidQueue::Job.clear_finished_in_batches code, where I have been using it like so in a recurring job:

SolidQueue::Job.clear_finished_in_batches(finished_before: <some_time>)

I also did an EXPLAIN QUERY on this query, and I seems like it's doing a sequential scan on all rows in solid_queue_jobs which are older than the time specified in finished_before:

QUERY PLAN
Delete on solid_queue_jobs (cost=84.52..1473.01 rows=0 width=0) (actual time=3.724..3.725 rows=0 loops=1)
-> Nested Loop (cost=84.52..1473.01 rows=500 width=38) (actual time=0.383..2.057 rows=500 loops=1)
-> HashAggregate (cost=83.96..88.96 rows=500 width=40) (actual time=0.374..0.449 rows=500 loops=1)
Group Key: "ANY_subquery".id
Batches: 1 Memory Usage: 105kB
-> Subquery Scan on "ANY_subquery" (cost=0.00..82.71 rows=500 width=40) (actual time=0.012..0.287 rows=500 loops=1)
-> Limit (cost=0.00..77.71 rows=500 width=8) (actual time=0.005..0.225 rows=500 loops=1)
-> Seq Scan on solid_queue_jobs solid_queue_jobs_1 (cost=0.00..621188.83 rows=3996682 width=8) (actual time=0.004..0.201 rows=500 loops=1)
Filter: ((finished_at IS NOT NULL) AND (finished_at < now()))
-> Index Scan using solid_queue_jobs_pkey on solid_queue_jobs (cost=0.56..2.77 rows=1 width=14) (actual time=0.003..0.003 rows=1 loops=500)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.723 ms
Trigger for constraint fk_rails_318a5533ed: time=2.681 calls=500
Trigger for constraint fk_rails_39bbc7a631: time=2.883 calls=500
Trigger for constraint fk_rails_4cd34e2228: time=2.044 calls=500
Trigger for constraint fk_rails_81fcbd66af: time=2.731 calls=500
Trigger for constraint fk_rails_9cfe4d4944: time=2.608 calls=500
Trigger for constraint fk_rails_c4316f352d: time=2.897 calls=500
Execution Time: 19.997 ms 

More information from query analysis on CloudSQL: image

Is it possible to simplify this query, or would there be a timeline for when clear_finished_jobs_after will happen automatically?

salmonsteak1 avatar Nov 05 '24 19:11 salmonsteak1

Hmm... interesting that you get that query. That code results in the following query for us:

DELETE FROM `solid_queue_jobs` 
WHERE `solid_queue_jobs`.`finished_at` IS NOT NULL 
AND `solid_queue_jobs`.`finished_at` < '2024-11-04 19:42:00.055618' 
LIMIT 1000;

that simply uses the index index_solid_queue_jobs_on_finished_at. Is this PostgreSQL?

rosa avatar Nov 05 '24 20:11 rosa

@rosa yes, we're using PostgreSQL

salmonsteak1 avatar Nov 06 '24 03:11 salmonsteak1

Ahh, I realised why this is the case. PostgreSQL doesn't support LIMIT on DELETE queries 😅 I don't have a good alternative for this one at the moment, I'm afraid. I'm not sure from reading the output from your EXPLAIN QUERY, but doesn this mean that the index on finished_at is not being used for the inner query?

Seq Scan on solid_queue_jobs solid_queue_jobs_1 (cost=0.00..621188.83 rows=3996682 width=8) (actual time=0.004..0.201 rows=500 loops=1)
Filter: ((finished_at IS NOT NULL) AND (finished_at < now()))

rosa avatar Nov 06 '24 11:11 rosa

Ahh, I realised why this is the case. PostgreSQL doesn't support LIMIT on DELETE queries 😅 I don't have a good alternative for this one at the moment, I'm afraid. I'm not sure from reading the output from your EXPLAIN QUERY, but doesn this mean that the index on finished_at is not being used for the inner query?

Seq Scan on solid_queue_jobs solid_queue_jobs_1 (cost=0.00..621188.83 rows=3996682 width=8) (actual time=0.004..0.201 rows=500 loops=1)
Filter: ((finished_at IS NOT NULL) AND (finished_at < now()))

Yes, I believe that's the case - I can't see much of a hit to the performance of our database now, but its a shame that PostgresSQL does it this way

salmonsteak1 avatar Nov 08 '24 07:11 salmonsteak1

A postgres approach would be something along the line of:

WITH rows_to_delete AS (
  SELECT id
  FROM solid_queue_jobs
  WHERE finished_at IS NOT NULL
    AND finished_at < $1
  LIMIT $2
)
DELETE FROM solid_queue_jobs
USING rows_to_delete
WHERE solid_queue_jobs.id = rows_to_delete.id;

Juksefantomet avatar Nov 28 '24 11:11 Juksefantomet