xp icon indicating copy to clipboard operation
xp copied to clipboard

Only allow 1 dump tasks to run

Open ComLock opened this issue 3 years ago • 14 comments

In a prod env there are multiple chefs, or a single misbehaving chef. So now it's running multiple dumps at the same time, with a lot of data, it might kill the cluster... so there should be some protection against running more than one dump at the same time...

In the beginning of a named task, simply check if any tasks with the same name is running, and die or schedule for later? Dying is probably the best. Also perhaps DT should have some nice UI error message...

ComLock avatar Dec 20 '21 09:12 ComLock

Perhaps a boolean parameter to submitTask so that every programmer don't have to implement their own protection?

In my crawlers I have made my own special protection because I must actually allow multiple runs of the same named task, just with different configs. So dissallowing running multiple tasks with the same config at the same time could be another option.

ComLock avatar Dec 20 '21 09:12 ComLock

Perhaps also giving tasks priority. For instance when system.dump (high-priority, let's call it P1) is running starting my lesser important tasks say P5 should be avioded.

ComLock avatar Dec 20 '21 09:12 ComLock

Maybe disallow running the same task multiple times if it has the same name, that is: Any task can only be running a single instance at any given time.

gbbirkisson avatar Dec 20 '21 11:12 gbbirkisson

Maybe disallow running the same task multiple times if it has the same name, that is: Any task can only be running a single instance at any given time.

That would break explorer. It needs to be optional somehow.

ComLock avatar Dec 20 '21 13:12 ComLock

That would break explorer. It needs to be optional somehow.

Explorer relies on running multiple tasks at the same time with the same name? Does it have to do that?

Maybe block same name by default, but then if you really have to run same-name tasks you have to provide a boolean param that says otherwise.

gbbirkisson avatar Dec 21 '21 08:12 gbbirkisson

Yes, explorer runs multiple tasks at the same time with the same name. Yes, it has to do that. A collector is a task. A collection is a "config" for the collector task. Explorer uses the same task to crawl multiple collections, or websites if you like. All at the same time... Scheduled and/or manually started.

ComLock avatar Dec 21 '21 08:12 ComLock

I don't see how explorer is related here, as we would never change the scheduler to behave this way in general - allowing just one "dump" to execute at a time would however sense to me.

sigdestad avatar Dec 21 '21 09:12 sigdestad

@sigdestad It's not about scheduler... mostly human error occurs with manually started ones... too many chefs, too many variables, things to keep an eye on.

ComLock avatar Dec 21 '21 11:12 ComLock

BTW the reason explorer has to run multiple tasks at the same time is there is not enough hours to crawl everything, even though we spread it out on all nights of the week. We could ofcourse spread it even more out to crawling 24/7

ComLock avatar Dec 21 '21 11:12 ComLock

as we would never change the scheduler to behave this way in general

Just to clarify. I mean only run a task with the same name 1 at a time. That would cover the dump problem, and also other potential problems. In the case of explorer, each "crawl" could maybe have a different name, and thus run in parallel.

gbbirkisson avatar Dec 21 '21 11:12 gbbirkisson

@gbbirkisson Named tasks are created at coding/compile time. Tasks configs are configured by a search admin during runtime. I cannot program a namedTask per config as I don't know the config upfront, and it changes all the time, since internet is changing all the time.

ComLock avatar Dec 21 '21 11:12 ComLock

Damn. That explains why this is a bad idea :+1:

gbbirkisson avatar Dec 21 '21 12:12 gbbirkisson

As mentioned earlier, this should be a low-level check in the dump implementation that will simply prevent multiple dumps from running at the same time.

sigdestad avatar Dec 21 '21 21:12 sigdestad

Actually dump, restore, backup, vacuum and maybe a few other tasks must not run simultaneously.

rymsha avatar Sep 13 '22 18:09 rymsha

Audit log cleanup should also just have one running job. But, important to throw a warning if this actually occurs.

sigdestad avatar Jan 30 '23 08:01 sigdestad

Maybe we could support a "only one job at a time" flag in the API?

sigdestad avatar Jan 30 '23 08:01 sigdestad

Starting from XP 7.13 task will there will be two major improvements API:

  • access to task's own ID value from controller
  • all tasks will get a name

get method in lib-task can be used to find tasks and, for instance, find the ones with the same name

For instance, management tasks use the following check to allow only one task to run in the entire cluster

    public static void checkAlreadySubmitted( final TaskInfo currentTaskInfo, final Collection<TaskInfo> allTasks )
    {
        final TaskId currentTaskId = currentTaskInfo.getId();
        final String currentTaskName = currentTaskInfo.getName();
        final Instant currentTaskStartTime = currentTaskInfo.getStartTime();

        final Optional<TaskInfo> priorTask = allTasks.stream()
            .filter( ti -> ti.getId() != currentTaskId ) // filter out self
            .filter( ti -> currentTaskName.equals( ti.getName() ) ) // lookup for specific name
            .filter( ti -> !ti.isDone() ) // only WAITING and RUNNING are considered submitted, others are done
            .filter( ti -> ti.getStartTime().isBefore( currentTaskStartTime ) || // task submitted earlier wins
                ti.getStartTime().equals( currentTaskStartTime ) && // if start time equals, pick one (but it should be a stable choice)
                    ti.getId().toString().compareTo( currentTaskId.toString() ) < 0 )
            .findAny();

        if ( priorTask.isPresent() )
        {
            throw new IllegalStateException(
                "Task " + currentTaskName + " [" + priorTask.get().getId() + "] is already submitted" );
        }
    }

rymsha avatar Feb 21 '23 11:02 rymsha

If there is a need to limit tasks per cluster node, there is a new filed node in TaskInfo. It also covers #9625

rymsha avatar Feb 21 '23 11:02 rymsha

Here's some TypeScript example code: https://github.com/enonic-playground/starter-tsup/blob/ca81b984e825841a6e2f7eddeacf99569763e675/src/main/resources/tasks/testRepoConnectionQuery/testRepoConnectionQuery.ts#L26-L66

ComLock avatar Mar 14 '23 10:03 ComLock