Cronicle icon indicating copy to clipboard operation
Cronicle copied to clipboard

[Bug]: Singleton setting seems unusable

Open jurweer opened this issue 5 months ago • 11 comments

DO NOT REPORT VULNERABILITIES HERE!

  • [x] By checking this box I am confirming this issue is NOT a vulnerability.

Is there an existing issue for this?

  • [x] I have searched the existing issues

What happened?

Hello, we are running 60ish cronjobs, most of which are to upload files and symlinks via ssh to an FTP.

setup is 6 worker machines (2 heavy duty with more RAM, 4 lightweight), Primary and Backup. the 6 workers are grouped. Primary/Backup have no jobs assigned. Only the 4 lightweight machines are moving data to the FTP. all the jobs are singleton.

lots are now overlapping causing issues with the symlinks, but also with the bandwidth.

you mentioned its on you TODO list. is this addressed somewhere already that requires a setting on our side?

Thanks for your time @jhuckaby !!

Operating System

AlmaLinux 9.6

Node.js Version

22.16.2

Cronicle Version

0.9.91

Server Setup

Multi-Primary with Workers

Storage Setup

NFS Filesystem

Relevant log output


Code of Conduct

  • [x] I agree to follow this project's Code of Conduct

jurweer avatar Oct 22 '25 14:10 jurweer

As you can see multiple jobs are listed - these are in some case 6 days old.

Image Image

jurweer avatar Oct 22 '25 14:10 jurweer

Do you actually see multiple singleton jobs running at the same time in the Cronicle UI? I'd love to see Cronicle logs, and Cronicle screenshots showing the singleton violation, so I can properly troubleshoot this. I don't think your issue is related to #376, as that was someone flooding the API with run_event HTTP requests. I think your issue may be something entirely different.

As you can see multiple jobs are listed - these are in some case 6 days old.

I see your screenshot showing multiple processes in top, yes, but I don't see what Cronicle is seeing. What does the Cronicle UI show (home tab)? Are the 6-day old jobs still showing as running in Cronicle? So what you're saying is that it's then launching new ones on top of the old jobs, all with the singleton flag set?

It sounds like you may have an issue where the job "completes" (or fails, aborts, etc.) in Cronicle, but you still have processes running in the background. See #248 for more details on this.

jhuckaby avatar Oct 22 '25 16:10 jhuckaby

I didnt make a screenshot of the cronicle interface, but it showed that the worker where the screenshots posted were made from, 1 job was running. i will see if i can recreate this - shouldnt be to hard with that many jobs :)

starting and stopping cronicle on the worker resolved the issue and the rsync processes get removed.

i will look through the linked issue and also test adding a trap.

thanks for the fast response

jurweer avatar Oct 22 '25 16:10 jurweer

Image

here you can see that ip .60 has 3 running jobs, but the worker has 6 processes. i had to redact, but all 6 processes have cronjobs specified

jurweer avatar Oct 22 '25 17:10 jurweer

Well, rsync always forks its own child process, so that explains the 6 processes for 3 jobs (example: rsync PID 2467876 is a child process of rsync PID 2466519).

But my question is about the 3 running jobs on host acw03-lw. Are you saying those 3 jobs all launched from an event that has the concurrency set to "Singleton"? If so, were they launched by API or by Cronicle's scheduler?

jhuckaby avatar Oct 22 '25 17:10 jhuckaby

all jobs we have are running in 'Singleton' mode.

here is maybe a better example, as there are two jobs hanging from the 21. of october

edit - and one from this morning editedit - all jobs are added in the UI, i am assuming this is the cronicle scheduler?

Image

jurweer avatar Oct 22 '25 17:10 jurweer

That's really very strange. The code that governs the job concurrency setting (i.e. singleton) hasn't changed in a over a decade. It's super simple:

https://github.com/jhuckaby/Cronicle/blob/master/lib/job.js#L117-L124

I cannot fathom how you are somehow getting multiple parallel jobs passed this barrier. If a job is running (on any server) it should prevent any subsequent jobs from launching. Very weird!

Well, I don't know what else to do here, but I will keep this issue open and see if anyone else from the community has any ideas. I'm at a loss as to how this is breaking so badly.

jhuckaby avatar Oct 22 '25 17:10 jhuckaby

Here is my test. The logic is 100% correct in this case. I created a test event which runs for 5 minutes, and has singleton set:

Image

It is also scheduled to launch 3 more times after the first launch, each one minute apart from the last.

Here's the progress as the job is running:

Image

You can see 3 upcoming events scheduled, but they never successfully launch. They all fail (as they should) because a job is still active:

Image

All expected and normal behavior.

I am unable to reproduce your issue 😞

jhuckaby avatar Oct 22 '25 17:10 jhuckaby

thanks for taking the time :)

potentially its the sheer amount of data flowing. we have folders syncing with millions of files and upload traffic of TB's, tricky rsync includes and excludes. we'll keep testing as we really like cronicle and going back to a vm with crontab is not something i like to do.

if i find something ill report back!

jurweer avatar Oct 22 '25 17:10 jurweer

Sure thing, and I'm really sorry this is happening. Let me know if you learn anything else.

Have you tried setting the concurrency at the category level?

Image

jhuckaby avatar Oct 22 '25 17:10 jhuckaby

this is set to no limit as we have project based categories.

a project has jobs that could be dependent on another, but also jobs that need to run independently, potentially with a higher frequency.

I can try, but I fear that would slow down production

jurweer avatar Oct 22 '25 18:10 jurweer