db-scheduler icon indicating copy to clipboard operation
db-scheduler copied to clipboard

Async, non-blocking jobs execution

Open coutoPL opened this issue 4 years ago • 13 comments

Hi,

Your lib almost perfectly fits my use case. But, after code inspecting, I suppose there is no support for asynchronous, non blocking jobs.

What do I exactly mean? I have plenty of jobs to start (let's say 10k/min). Each of it contains a HTTP request(s), which I call using non-blocking http client. So, I have eg. CompletableFuture[Response] and based on the response I have to decide if reschedule the task instance or not (custom task and ExecutionHandler seem to be great to do it). Currently, I have to block the thread, but this is not the way to go, because there is a lot of tasks to start (requirement: immediately or as soon as possible) and mean time of waiting for response can be ~30s. The thread is blocked and just waiting for IO.

It seems that it can be solved if db-scheduler could be able to define and use sth like async execution handler:

public interface ExecutionHandler<T> {
   CompletableFuture<CompletionHandler<T>> execute(TaskInstance<T> taskInstance, ExecutionContext executionContext);
}

Or maybe I missed sth and there is a way to achieve sth like described above? If not, WDYT about the idea to introduce async interface and adapt db-scheduler to be able to work with it? Do you see any obstacles?

Thanks for your effort. This project and your care for it looks very impressive.

coutoPL avatar Jun 04 '20 13:06 coutoPL

Hi! What level of parallelism per scheduler-instance do you need?

I haven't really thought about it before, but I agree it makes sense if you have many but slow executions. Trying to think about what the obstacles are...

kagkarlsson avatar Jun 04 '20 13:06 kagkarlsson

I don't see any obvious obstables, probably need to get a feel for how it would affect the code-base to be able to evaluate it properly

kagkarlsson avatar Jun 04 '20 13:06 kagkarlsson

I'd like to control parallelism by thread pool set for a scheduler instance.

Let's say I give the scheduler pool of 5 threads (fixed) and the scheduler should run as many executions as it can. If job/execution release thread (because eg. it waits for IO), the next job/execution can be run. The IO result of previous execution will be (can be) handled (after async response) by other thread than the initiation one.

Ok, if we decide to use db-scheduler, I'll try to add this functionality to the lib, if you don't mind :)

coutoPL avatar Jun 04 '20 13:06 coutoPL

Ok 👍

I am a bit uncertain about how the algorithm controlling how many executions the scheduler would be allowed to pick would look... 🤔

kagkarlsson avatar Jun 04 '20 13:06 kagkarlsson

Me neither. I've not been there yet. But I'm going to consult with you all uncertain things here. Stay tuned :)

coutoPL avatar Jun 04 '20 14:06 coutoPL

@coutoPL, just currious, i'm doing similar thing here with Axon, why not just re-schedule in CompletableFuture's handle or thenApply?

muradm avatar Nov 28 '20 22:11 muradm

Hi guys, I'd like to revive this issue 🙃

We have a nearly identical use-case as the OP, just not with that many tasks in need for scheduling (yet?) and with lower expected task duration so even blocking the thread works for us now. But that doesn't mean I wouldn't like to see support for this in the lib directly 🙂

I'd be willing to participate in building this feature, but I think it would be best to first properly think about how exactly would such a feature get incorporated into the current code base, no? Maybe, @kagkarlsson if you could find some time to just think about this some day and write down your thoughts on the topic, that would be something to start from.

@muradm I though about doing just that, simply registering a callback and doing your stuff in there, but there are number of reasons why that doesn't work. The main one is that when you return from a task's execution handler, you need to somehow specify what should happen with that task instance:

  1. If you return CompletionHandler.OnCompleteRemove(), you've just marked the task as finished even though it has not finished yet. That's a weird state on its own, but it is a problem. Imagine you application suddenly crashes - then the task instance has not finished, but it will not get ever picked up, because it appears completed to the scheduler.
  2. You can't really return OnCompletionReschedule since you don't know the outcome of your task at the moment.
  3. You could implement a no-op CompletionHandler, but then again you don't use the Scheduler properly. This time you'd make the task look "picked" even after it has finished.

Long story short: you may be able to make it (almost) work, but you'd need to duplicate parts of logic currently being handled by the Scheduler for you, such as tracking of failures etc. And you can't really work around the issue that once your node goes down unexpectedly, you simply loose the state of task instances currently being executed by that node since that state is now only kept in your app's memory 🤷‍♂️

dmoidl avatar Feb 05 '21 13:02 dmoidl

I would just like to say that we have postponed this problem for later, so I'm not working on it atm. I didn't even start.

coutoPL avatar Feb 05 '21 13:02 coutoPL

I will think some on this use-case when I get some time :)

In the mean time, if you are considering ways to implement this, it might be good to know that I am working on a refactor of db-scheduler to support select-for-update polling. In doing that, I have extracted some of the execution logic into a separate class, possibly making this feature here easier to implement (haven't checked). You can find the new code here: https://github.com/kagkarlsson/db-scheduler/pull/175/files#diff-d015624ee0d7dbc8378459e38bfffc0581736254a71f6d14b953343eff50089dR1

kagkarlsson avatar Feb 05 '21 13:02 kagkarlsson

I identified the interface which should be changed to introduce the fact that job can be done asynchronously.

public interface ExecutionHandler<T> {
   CompletableFuture<CompletionHandler<T>> execute(TaskInstance<T> taskInstance, ExecutionContext executionContext);
}

I was going to start there.

coutoPL avatar Feb 05 '21 13:02 coutoPL

This might also be relevant: https://github.com/kagkarlsson/db-scheduler/pull/175/files#diff-df287310caac61d8eb3a2d19d71d2eb3f4eca61634e9900b701cfc934dab09eaR34

kagkarlsson avatar Feb 05 '21 13:02 kagkarlsson

thanks folks, this is nice. I am looking to adopt db-scheduler and pair it up with kotlin coroutines to execute short tasks which are going to mainly interact with 2-3 other http services (async http client). hence, looking to execute tasks async'ly.

IIUC, if I implement ExecutorService using coroutines, and initiatize Scheduler with it, then I am done ? Appreciate the implementation and work so far. love it.

UPDATE: never mind, I understand it needs more changes.

amit-handda avatar Mar 15 '22 02:03 amit-handda

@kagkarlsson we created a PR to address this issue, would be awesome if you could review it sometime. TYSM https://github.com/kagkarlsson/db-scheduler/pull/304

amit-handda avatar Jul 13 '22 22:07 amit-handda