dagu icon indicating copy to clipboard operation
dagu copied to clipboard

idempotency, double ask and only once.

Open gedw99 opened this issue 1 year ago • 9 comments

I really like Dagu and the other projects.

I think that I cant use Dagu the way it is now, because it won't protect from race conditions and double work.

Could someone please elaborate on the idempotency of the current architecture ?

For any workflow you kind of have a message queue pattern, in that you want a system to be able to run a workflow, and if it fails, then you it will rerun it and it will only run what did not run before.

For example a message queue has the "only once" or "at least once".

If a Dagu ops occur then we dont really have a way to control what it does. Like a FS change or an email sent. So we cant control that.

If however there was the notion of a BUS or Message Queue then we could.

NATS Jetstream can do inprocess now btw. https://github.com/nats-io/nats.go/blob/b61c7c554f188ff40e15fb3c4668bad959ccde86/nats.go#L870

And the way they test is to use NATS embedded and InProcess: https://github.com/nats-io/nats.go/blob/b61c7c554f188ff40e15fb3c4668bad959ccde86/test/nats_test.go#L1125

Many examples on line:

https://github.com/search?q=nats.InProcessServer+language%3AGo&type=code&ref=advsearch

Often used for jobs runners !!

https://github.com/synadia-io/rethink_connectivity/blob/8965aa656bbf913b7ac48f89cbc5c29f92a496ef/20-embedding-nats-server/main.go#L63

It already does out of process and cluster and super cluster. I bring this up, because at least if the DAG pushed all "command" through via NATS, then NATS could keep track of ensuring "only once" for us. If it was already done, it won't do it again.

So you can run Dags and know that if it failed, it will restart and only run what actually did not run last time,

NATS has the Double ACK approach for this. Here is some devs explaining it better than me :)

https://www.reddit.com/r/NATS_io/comments/1b9tjmm/how_does_nats_jetstream_ensure_exactly_once/

gedw99 avatar Aug 29 '24 07:08 gedw99

Hi, thank you so much for the detailed information. As for the message queue, Dagu currently doesn't include this functionality. It immediately execute the DAG if it's not running or not. NATS is definitely an interesting option, but for now, I prefer to keep things as simple as possible. My plan is to have Dagu focus on being the execution layer, while queuing and task management would be handled by the controller layer, which would be managed by another software.

yottahmd avatar Aug 29 '24 08:08 yottahmd

We are building a queueing system within dagu which actually allows user to manage the jobs in dagu. But we have a configurable parameter called waiting queue length which controls how many dags are running concurrently. Retry jobs dont come under it. Only jobs that are started via post api or UI are considered. We will be try to create a pull request for it soon.

Actually we were following the approach mentioned above but that does not stop user from running as many jobs as he wants from the UI. And one more issue in above approach is to get the list of running dags from server side. The api/dags/v1 end point is very slow as far as getting number of running dags is concerned. So we have created couple of data structures that keep list of dag ids running and waiting in json format. Our developer @kriyanshii is working on it.we are using 1.14.3 currently. We are thankful to her.

ghansham avatar Sep 17 '24 01:09 ghansham

hey you can check the stable version with queueing here: https://github.com/kriyanshii/dagu/tree/queue.

kriyanshii avatar Oct 06 '24 17:10 kriyanshii

it's inherent. will introduce api for purging tho. so the idea is - initially when you start the dag it will check number of dags currently running and if it is equal to queueLength it will queue it.

kriyanshii avatar Oct 09 '24 15:10 kriyanshii

This feature may be more useful for non cron dags where the execution time is relatively more and you have data received from sensors at regular intervals for processing. Basically batch prpcessing jobs

ghansham avatar Oct 15 '24 15:10 ghansham