[Bug]: Singleton setting seems unusable
DO NOT REPORT VULNERABILITIES HERE!
- [x] By checking this box I am confirming this issue is NOT a vulnerability.
Is there an existing issue for this?
- [x] I have searched the existing issues
What happened?
Hello, we are running 60ish cronjobs, most of which are to upload files and symlinks via ssh to an FTP.
setup is 6 worker machines (2 heavy duty with more RAM, 4 lightweight), Primary and Backup. the 6 workers are grouped. Primary/Backup have no jobs assigned. Only the 4 lightweight machines are moving data to the FTP. all the jobs are singleton.
lots are now overlapping causing issues with the symlinks, but also with the bandwidth.
you mentioned its on you TODO list. is this addressed somewhere already that requires a setting on our side?
Thanks for your time @jhuckaby !!
Operating System
AlmaLinux 9.6
Node.js Version
22.16.2
Cronicle Version
0.9.91
Server Setup
Multi-Primary with Workers
Storage Setup
NFS Filesystem
Relevant log output
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
As you can see multiple jobs are listed - these are in some case 6 days old.
Do you actually see multiple singleton jobs running at the same time in the Cronicle UI? I'd love to see Cronicle logs, and Cronicle screenshots showing the singleton violation, so I can properly troubleshoot this. I don't think your issue is related to #376, as that was someone flooding the API with run_event HTTP requests. I think your issue may be something entirely different.
As you can see multiple jobs are listed - these are in some case 6 days old.
I see your screenshot showing multiple processes in top, yes, but I don't see what Cronicle is seeing. What does the Cronicle UI show (home tab)? Are the 6-day old jobs still showing as running in Cronicle? So what you're saying is that it's then launching new ones on top of the old jobs, all with the singleton flag set?
It sounds like you may have an issue where the job "completes" (or fails, aborts, etc.) in Cronicle, but you still have processes running in the background. See #248 for more details on this.
I didnt make a screenshot of the cronicle interface, but it showed that the worker where the screenshots posted were made from, 1 job was running. i will see if i can recreate this - shouldnt be to hard with that many jobs :)
starting and stopping cronicle on the worker resolved the issue and the rsync processes get removed.
i will look through the linked issue and also test adding a trap.
thanks for the fast response
here you can see that ip .60 has 3 running jobs, but the worker has 6 processes. i had to redact, but all 6 processes have cronjobs specified
Well, rsync always forks its own child process, so that explains the 6 processes for 3 jobs (example: rsync PID 2467876 is a child process of rsync PID 2466519).
But my question is about the 3 running jobs on host acw03-lw. Are you saying those 3 jobs all launched from an event that has the concurrency set to "Singleton"? If so, were they launched by API or by Cronicle's scheduler?
all jobs we have are running in 'Singleton' mode.
here is maybe a better example, as there are two jobs hanging from the 21. of october
edit - and one from this morning editedit - all jobs are added in the UI, i am assuming this is the cronicle scheduler?
That's really very strange. The code that governs the job concurrency setting (i.e. singleton) hasn't changed in a over a decade. It's super simple:
https://github.com/jhuckaby/Cronicle/blob/master/lib/job.js#L117-L124
I cannot fathom how you are somehow getting multiple parallel jobs passed this barrier. If a job is running (on any server) it should prevent any subsequent jobs from launching. Very weird!
Well, I don't know what else to do here, but I will keep this issue open and see if anyone else from the community has any ideas. I'm at a loss as to how this is breaking so badly.
Here is my test. The logic is 100% correct in this case. I created a test event which runs for 5 minutes, and has singleton set:
It is also scheduled to launch 3 more times after the first launch, each one minute apart from the last.
Here's the progress as the job is running:
You can see 3 upcoming events scheduled, but they never successfully launch. They all fail (as they should) because a job is still active:
All expected and normal behavior.
I am unable to reproduce your issue 😞
thanks for taking the time :)
potentially its the sheer amount of data flowing. we have folders syncing with millions of files and upload traffic of TB's, tricky rsync includes and excludes. we'll keep testing as we really like cronicle and going back to a vm with crontab is not something i like to do.
if i find something ill report back!
Sure thing, and I'm really sorry this is happening. Let me know if you learn anything else.
Have you tried setting the concurrency at the category level?
this is set to no limit as we have project based categories.
a project has jobs that could be dependent on another, but also jobs that need to run independently, potentially with a higher frequency.
I can try, but I fear that would slow down production