minion icon indicating copy to clipboard operation
minion copied to clipboard

Reimplementation of previous repair, but without significant performance hit

Open HEM42 opened this issue 1 year ago • 11 comments

Summary

Reimplementation of previous repair, but without significant performance hit on large amount of jobs

Motivation

Restore repair functionality removed with the release of v10.27, but where this time it works on large scale installations, see referenced issue for numbers

References

https://github.com/mojolicious/minion/issues/129

HEM42 avatar May 07 '24 18:05 HEM42

Please squash commits.

kraih avatar May 07 '24 20:05 kraih

reading the new minion query, it isn't obvious to me, does it handle deeper dependencies to me it looks like if A (finished) -> B (finished) -> C (active) then it will not clean up B but will still clean up A that's my reading of that query anyway, can you confirm?

jberger avatar May 08 '24 17:05 jberger

Oh, it seems I misremembered the previous functionality. I had thought it would preserve the whole chain but the query does not appear to be so https://github.com/mojolicious/minion/commit/cb9182acc83938ac46ee6227e4e9d914eb56cf28

jberger avatar May 08 '24 17:05 jberger

reading the new minion query, it isn't obvious to me, does it handle deeper dependencies to me it looks like if A (finished) -> B (finished) -> C (active) then it will not clean up B but will still clean up A that's my reading of that query anyway, can you confirm?

I had the exact same question, so made a little example for my own sake, just to map it out. Job 1 creates 2, 3, 4 and sets itself as parent for those. It also creates job 5 which is to be run when 2, 3 and 4 completes. Something like this:

   +---> 2 ---+
1 -|---> 3 ---|-> 5
   +---> 4 ---+

After running this is the state:

1:
    parents = []
    state   = finished
2:
    parents = [1]
    state   = finished
3:
    parents = [1]
    state   = failed
4:
    parents = [1]
    state   = finished
5:
    parents = [2,3,4]
    state   = inactive

Then how I read the query you'd for

SELECT id FROM minion_jobs WHERE state = 'finished' AND finished <= NOW() - INTERVAL '1 second' * ?

get 1, 2, 4 since those are the finished jobs, and for

EXCEPT SELECT unnest(parents) AS id FROM minion_jobs WHERE state != 'finished'

you'd get 1 from 3.parents and 2, 3, 4 from 5.parents. This should result in zero rows being deleted. At least how I understand it. Super if anyone else could sanity check this in case I have a bit of the dumb today.

What I have not verified is what happens when the chain is "deeper".

christopherraa avatar May 08 '24 18:05 christopherraa

reading the new minion query, it isn't obvious to me, does it handle deeper dependencies to me it looks like if A (finished) -> B (finished) -> C (active) then it will not clean up B but will still clean up A that's my reading of that query anyway, can you confirm?

This is correct, A would be removed in this case. But this is how the previous query worked.

Main focus was to restore previous functionality.

HEM42 avatar May 08 '24 18:05 HEM42

reading the new minion query, it isn't obvious to me, does it handle deeper dependencies to me it looks like if A (finished) -> B (finished) -> C (active) then it will not clean up B but will still clean up A that's my reading of that query anyway, can you confirm?

I had the exact same question, so made a little example for my own sake, just to map it out. Job 1 creates 2, 3, 4 and sets itself as parent for those. It also creates job 5 which is to be run when 2, 3 and 4 completes. Something like this:

   +---> 2 ---+
1 -|---> 3 ---|-> 5
   +---> 4 ---+

You would actually need to have something like

   +---> 2 ---+
1 -|---> 3 ---|-> 5
   |---> 4 ---|
   +----------+

In order to preserve the flow, atleast until 5 has finished

HEM42 avatar May 08 '24 18:05 HEM42

@HEM42 Ah! Good thing you clarified then, as I had misunderstood / misread the logic. Thank you. I think I'll have to consider myself neutral then, at least until I've pondered on the whole "delete parts of the tree" thing here. If 1 is deleted I think at least this should be documented so that it is clear to the users how this deletion works and how relationships needs to be set up to avoid partial deletes. It might be documented somewhere already if this is the previous functionality, but I cannot remember seeing it.

christopherraa avatar May 08 '24 18:05 christopherraa

@HEM42 Another follow up on this, as I couldn't get your explaination to make sense after reading the SQL again. So I basically made this minimal example:

create table minion_jobs(id integer, state text, parents integer[]);
insert into minion_jobs(id, state, parents) values 
     (1, 'finished', null),
     (2, 'finished', '{1}'),
     (3, 'failed', '{1}'),
     (4, 'finished', '{1}'),
     (5, 'inactive', '{2,3,4}');

when running a slightly modified query (skipping the timestamp for convenience here) gives this result:

SELECT id FROM minion_jobs WHERE state = 'finished' EXCEPT SELECT unnest(parents) AS id FROM minion_jobs WHERE state != 'finished';
┌────┐
│ id │
├────┤
└────┘
(0 rows)

What am I missing?

christopherraa avatar May 08 '24 18:05 christopherraa

@HEM42 Ah! Good thing you clarified then, as I had misunderstood / misread the logic. Thank you. I think I'll have to consider myself neutral then, at least until I've pondered on the whole "delete parts of the tree" thing here. If 1 is deleted I think at least this should be documented so that it is clear to the users how this deletion works and how relationships needs to be set up to avoid partial deletes. It might be documented somewhere already if this is the previous functionality, but I cannot remember seeing it.

The docs for remove_after states

Amount of time in seconds after which jobs that have reached the state finished and have no unresolved dependencies will be removed automatically by "repair", defaults to 172800 (2 days). It is not recommended to set this value below 2 days.

Which was the reason for #129 in the first place, and we think we have good solution to bring it back.

HEM42 avatar May 08 '24 18:05 HEM42

@HEM42 Another follow up on this, as I couldn't get your explaination to make sense after reading the SQL again. So I basically made this minimal example:

create table minion_jobs(id integer, state text, parents integer[]);
insert into minion_jobs(id, state, parents) values 
     (1, 'finished', null),
     (2, 'finished', '{1}'),
     (3, 'failed', '{1}'),
     (4, 'finished', '{1}'),
     (5, 'inactive', '{2,3,4}');

when running a slightly modified query (skipping the timestamp for convenience here) gives this result:

SELECT id FROM minion_jobs WHERE state = 'finished' EXCEPT SELECT unnest(parents) AS id FROM minion_jobs WHERE state != 'finished';
┌────┐
│ id │
├────┤
└────┘
(0 rows)

What am I missing?

Not missing anything, but if you where to not fail job with id 3. You would then see

create table minion_jobs(id integer, state text, parents integer[]);
insert into minion_jobs(id, state, parents) values
     (1, 'finished', null),
     (2, 'finished', '{1}'),
     (3, 'finished', '{1}'),
     (4, 'finished', '{1}'),
     (5, 'inactive', '{2,3,4}');

SELECT id FROM minion_jobs WHERE state = 'finished' EXCEPT SELECT unnest(parents) AS id FROM minion_jobs WHERE state != 'finished';
 id
----
  1
(1 row)

HEM42 avatar May 09 '24 05:05 HEM42

any updates on this ?

is there need for more information in order to take a decision

HEM42 avatar May 16 '24 09:05 HEM42