gocron icon indicating copy to clipboard operation
gocron copied to clipboard

[FEATURE] - Run a scheduled a job once in a distributed system

Open marcsantiago opened this issue 3 years ago • 7 comments

Is your feature request related to a problem? Please describe

Currently If I use this package on any cluster of AWS ec2s, all my machines running production code would also run code in the scheduler. For instance if I need a background go routine to scan data every minute and I have 5 machines deployed. I don't need or want all my machines to scan the same data. It's also not cost effective to spin a single isolated ec2 to run a single job.

Describe the solution you'd like

I would like if gocron had to ability to add connections to persistent storage via some sort of interface to allow users to plugin any persistent storage (redis, memcache, sql, etc) such that it kept track of the jobs running. If a job is running by 1 machine and you've set the settings to allow only 1 job globally to run, then all other machines do not run the same job. In essence it would be safe to deploy code on a fleet of web servers and have only 1 web server at any given time run the same job.

some considerations:

  • When a cluster of machines are being deployed, restarted, or added to a load balancer, there is a chance for a race condition to occur given the latency it may take the fastest instance to write to persistent storage. Example If machine A writes to storage saying "hey I'm responsible for the job" and machine B checks storage a few minutes later and doesn't find anything in persistent storage it will also say "hey I'm responsible for the job" . This is because machines get updates in batches. In order to prevent this behavior there needs to be some sort of jitter or perhaps a queue is used. For instance of all the servers start up and schedule a job. The name space for that job should be the same on each machine. If i'm using redis, I would upsert a job by some shared id. When the scheduler ticks on the application it pops from the queue, if there is an item then it's granted the ability to run the job else the job does not run. When the job completes on the machine with the successful pop, it adds the job id back to the queue for the next tick.... something like that

Describe alternatives you've considered

  • Spinning up isolated ec2s that do nothing other then run a single job, at the cost or operational cognitive load and monetary cost
  • Use other distributed systems like machinery, however projects like machinery aren't necessarily meant for cron like tasks.
  • Compile a Go binary and use the native linux cron system, but again this means moving away from using a very clean package and system that gocron presents

marcsantiago avatar May 27 '21 14:05 marcsantiago

Definitely something we’re interested in supporting

JohnRoesler avatar May 28 '21 00:05 JohnRoesler

I like the way that this platform (https://benthos.dev/) abstracts different data sources and I think it would be ideal to then add a reasonable interface to gocron that we could then support multiple specific implementations like redis, ec2, etc.

JohnRoesler avatar Jun 10 '21 00:06 JohnRoesler

The original version had SetLocker where you were able to provide an implementation to lock and unlock. Is there still something similar?

ianaz avatar Jun 28 '21 09:06 ianaz

@ianaz it never worked properly so we removed it. That’d be a good place to start, adding a locker interface.

JohnRoesler avatar Jun 28 '21 13:06 JohnRoesler

When in a cluster environment, nodes in cluster will be known. Maybe consistenthash may be usable here to detect which node will run which jobs.

derkan avatar Jul 02 '21 07:07 derkan

@derkan that is an interesting idea. Perhaps we could introduce some sort of distributed provider that could have different implementations. I was thinking something along the lines of a redis cache as a locker for jobs - example

JohnRoesler avatar Sep 30 '21 03:09 JohnRoesler

Hi guys, @marcsantiago @JohnRoesler I wanted to know what were your final thoughts on the subject?

I have a similar situation where I am using gocron to send out a status email once a week. The program is being executed on sevral servers for efficiancy. I am looking into implementing the suggested "Distributed Locks with Redis" and would like to hear if this was succsessful for you. maybe ever get some advice and things to watch out for while I am pursuing this solution. Thanks

avimess23 avatar Jun 24 '22 06:06 avimess23

I wonder if the approach here could maybe be similar to APScheduler begin with OOTB persistent stores supported like Redis/Postgres - once an interface can be reasonably standardized on, allow people to pass in their own implementations that satisfy it and the core would have to handle queues/locks, moving onto next available etc...

I realize the current gocron kind of gives everyone the ability to roll their own version of the above but would be nice to provide that as a built-in option.

dnitsch avatar Nov 07 '22 10:11 dnitsch

Hello, does this feature supported ? I'm looking for job with shedlock for distritbuted system. I think if this included in this lib is better than Implement in my own project

vuhoanghiep1993 avatar Dec 22 '22 04:12 vuhoanghiep1993

Hi, I am also interested in this

manuelarte avatar Jan 26 '23 08:01 manuelarte

If anyone is able to test out the feature I have in the works that would be great! v1.24.0-rc2

JohnRoesler avatar May 03 '23 02:05 JohnRoesler

I've released distributed locker support with redis and marked it as in beta. Please test and provide feedback when you have a chance!

JohnRoesler avatar May 05 '23 02:05 JohnRoesler

Maybe this is not the place, but this is a locker I created and could be used as an example/inspiration for someone else with the same needs:

https://github.com/go-co-op/gocron/discussions/529

manuelarte avatar Jul 27 '23 07:07 manuelarte