Cronicle
Cronicle copied to clipboard
Cronicle reports failure on starting up long lived processes
Summary
I have processes that need to run for the duration of the day. The way I've attempted to set this up in Cronicle is to wrap the C++ executable in a script with start and stop commands. The stop command is a 'kill 15/9', and Cronicle successfully runs this stop job, and reports success.
For the start however, the actual C++ executable does start and run, but Cronicle itself reports failure. I've tried running this in the background as well as explicitly returning 0, but this still doesn't work
Steps to reproduce the problem
Your Setup
#!/bin/bash APP=<some long running app> LOG=<some log> $CMD > $LOG & exit 0
Operating system and version?
Centos 7.6
Node.js version?
14.17.2
Cronicle software version?
0.8.62
Are you using a multi-server setup, or just a single server?
Single Server
Are you using the filesystem as back-end storage, or S3/Couchbase?
Filesystem
Can you reproduce the crash consistently?
Yes
Log Excerpts
Date/Time: 2021/07/14 07:55:11 (GMT-5) Event Title: MDListener Start Category: ZeroMQ Listener Server Target: All Servers Plugin: Shell Script
Job ID: jkr3ff21x1p Hostname: ccapcbot-01 PID: 43999 Elapsed Time: 1 hour Performance Metrics: (No metrics provided) Avg. Memory Usage: 23.8 MB (Peak: 23.8 MB) Avg. CPU Usage: 0% (Peak: 0.8%) Error Code: 1
Error Description: Job Aborted: Exceeded maximum run time (1 hour)
Your log says "timeout" , cronicle kills the job after 1 hour by default. Is that the case? Then just change timeout settings.
No, the process continues running and is not killed by cronicle at all. Cronicle actually starts the process successfully, the process runs (obviously doesn't return right away), and then after an hour cronicle times out and reports failure. I'm guessing the issue here is that my process doesn't return an exit code (and it won't), so i'm looking for workaround on how to set this up.
Changing the timeout setting to something larger will only result in reporting the failure being delayed, but it won't mark the task as successful.
OK. First of all you can disable timeout at all (set it to 0). As I understand even if there is no timeout, you process will start, but then it would report error at the end no matter what? You should be see the exit code / stderr messages in the job history upon completion. What does it say? Also, are you sure you script won't return non-zero exit code if you run it in terminal?
Wait, do you just want to kill your job via cronicle's timeout and report success? If that's the case - i dont think it's possible . From the other side you can do this in your script. E.g. if your command is "sleep":
timeout 5 sleep 10 || true
Maybe I'm doing a bad job of explaining this.
My process never exits, thus there is no error code, and no stderr. And this is by design. As an example, this is a process that starts at 6am and needs to be stopped at 8pm.
So I have two chronicle events
at 6am => "Start" my process. Cronicle properly calls my start script, the process is started, but at 7am, cronicle (after the default 1 hr timeout) marks the event as failed. I guess what i need is a way to convey to Cronicle that this is not a failure - this is a long running process that will not return an exit code. I just don't know how to set this up. Changing timeout to zero just ends up reporting a failure ASAP instead of an 1 hr later. When run from terminal, this same script has a zero return code.
at 8pm => "Stop" my process. This works fine. Cronicle properly calls my stop script, which is effectively calling kill -9 <pid>
. My process is stopped, cronicle reports the event as successful.
Okay, so I think I see what is going on here. You're doing this:
#!/bin/bash
APP=<some long running app>
LOG=<some log>
$CMD > $LOG &
exit 0
What happens here is, the $CMD
is still "attached" to the terminal, even though you are launching it in the background with the &
. This is a Linux quirk, by design. So the problem is, even though you did a background exec, Cronicle is still "waiting" for it to exit. I think you need to fully detach your sub-process from the terminal.
I believe the shell command you want is called disown
. https://www.cyberciti.biz/faq/unix-linux-disown-command-examples-usage-syntax/
ah, thanks for the explanation. i'll give this a try!