xp
xp copied to clipboard
Only allow 1 dump tasks to run
In a prod env there are multiple chefs, or a single misbehaving chef. So now it's running multiple dumps at the same time, with a lot of data, it might kill the cluster... so there should be some protection against running more than one dump at the same time...
In the beginning of a named task, simply check if any tasks with the same name is running, and die or schedule for later? Dying is probably the best. Also perhaps DT should have some nice UI error message...
Perhaps a boolean parameter to submitTask so that every programmer don't have to implement their own protection?
In my crawlers I have made my own special protection because I must actually allow multiple runs of the same named task, just with different configs. So dissallowing running multiple tasks with the same config at the same time could be another option.
Perhaps also giving tasks priority. For instance when system.dump (high-priority, let's call it P1) is running starting my lesser important tasks say P5 should be avioded.
Maybe disallow running the same task multiple times if it has the same name, that is: Any task can only be running a single instance at any given time.
Maybe disallow running the same task multiple times if it has the same name, that is: Any task can only be running a single instance at any given time.
That would break explorer. It needs to be optional somehow.
That would break explorer. It needs to be optional somehow.
Explorer relies on running multiple tasks at the same time with the same name? Does it have to do that?
Maybe block same name by default, but then if you really have to run same-name tasks you have to provide a boolean param that says otherwise.
Yes, explorer runs multiple tasks at the same time with the same name. Yes, it has to do that. A collector is a task. A collection is a "config" for the collector task. Explorer uses the same task to crawl multiple collections, or websites if you like. All at the same time... Scheduled and/or manually started.
I don't see how explorer is related here, as we would never change the scheduler to behave this way in general - allowing just one "dump" to execute at a time would however sense to me.
@sigdestad It's not about scheduler... mostly human error occurs with manually started ones... too many chefs, too many variables, things to keep an eye on.
BTW the reason explorer has to run multiple tasks at the same time is there is not enough hours to crawl everything, even though we spread it out on all nights of the week. We could ofcourse spread it even more out to crawling 24/7
as we would never change the scheduler to behave this way in general
Just to clarify. I mean only run a task with the same name 1 at a time. That would cover the dump problem, and also other potential problems. In the case of explorer, each "crawl" could maybe have a different name, and thus run in parallel.
@gbbirkisson Named tasks are created at coding/compile time. Tasks configs are configured by a search admin during runtime. I cannot program a namedTask per config as I don't know the config upfront, and it changes all the time, since internet is changing all the time.
Damn. That explains why this is a bad idea :+1:
As mentioned earlier, this should be a low-level check in the dump implementation that will simply prevent multiple dumps from running at the same time.
Actually dump, restore, backup, vacuum and maybe a few other tasks must not run simultaneously.
Audit log cleanup should also just have one running job. But, important to throw a warning if this actually occurs.
Maybe we could support a "only one job at a time" flag in the API?
Starting from XP 7.13 task will there will be two major improvements API:
- access to task's own ID value from controller
- all tasks will get a name
get method in lib-task can be used to find tasks and, for instance, find the ones with the same name
For instance, management tasks use the following check to allow only one task to run in the entire cluster
public static void checkAlreadySubmitted( final TaskInfo currentTaskInfo, final Collection<TaskInfo> allTasks )
{
final TaskId currentTaskId = currentTaskInfo.getId();
final String currentTaskName = currentTaskInfo.getName();
final Instant currentTaskStartTime = currentTaskInfo.getStartTime();
final Optional<TaskInfo> priorTask = allTasks.stream()
.filter( ti -> ti.getId() != currentTaskId ) // filter out self
.filter( ti -> currentTaskName.equals( ti.getName() ) ) // lookup for specific name
.filter( ti -> !ti.isDone() ) // only WAITING and RUNNING are considered submitted, others are done
.filter( ti -> ti.getStartTime().isBefore( currentTaskStartTime ) || // task submitted earlier wins
ti.getStartTime().equals( currentTaskStartTime ) && // if start time equals, pick one (but it should be a stable choice)
ti.getId().toString().compareTo( currentTaskId.toString() ) < 0 )
.findAny();
if ( priorTask.isPresent() )
{
throw new IllegalStateException(
"Task " + currentTaskName + " [" + priorTask.get().getId() + "] is already submitted" );
}
}
If there is a need to limit tasks per cluster node, there is a new filed node
in TaskInfo
. It also covers #9625
Here's some TypeScript example code: https://github.com/enonic-playground/starter-tsup/blob/ca81b984e825841a6e2f7eddeacf99569763e675/src/main/resources/tasks/testRepoConnectionQuery/testRepoConnectionQuery.ts#L26-L66