Cronicle
Cronicle copied to clipboard
Cronicle fails after 7 days of work
After 7 days of operation, the slave server starts to generate several points in the activity log, always losing connectivity for a few seconds and then coming back, which interferes with the process of the jobs running on it.
Summary
Cronicle slave server fails after ~7 days of work
Steps to reproduce the problem
My tests were based on letting it run for seven days, so I have this problem constantly
Your Setup
I have two ec2 on aws, the slave as t2.large and the master as t2.small. The master is for UI only, it does not run any jobs, while the slave runs all jobs. I use s3 as storage system for both. These failures usually occur at peak times (where I can run an average of 20/30 jobs in the same minute). Slave ec2 cpu (t2.large) never went above 30%
Operating system and version?
Ubuntu 20.04.2 LTS
Node.js version?
v17.7.2
Cronicle software version?
0.9.2
Are you using a multi-server setup, or just a single server?
multi-server setup, master and slave
Are you using the filesystem as back-end storage, or S3/Couchbase?
S3
Can you reproduce the crash consistently?
Only if i let it run for seven days
Log Excerpts
it doesn't generate any logs for me
@jhuckaby
I've never heard of this happening before. I run a large Cronicle cluster of many servers on live production for months at a time, with no issues like this.
It sounds like the server may be running out of memory? I can't think of anything else that would cause a random disconnection after 7 days.
@srgoogle23 Could be also network issue. Are those machines have static IPs? Anyway you can check logs/Cronicle.log file, see if cronicle was crashing/restarting, or maybe VM restarted itself. Doesn't sound like cronicle issue.
It's not a problem of vm, cloudwatch hasn't issued any down alerts, while cronicle hasn't issued any logs yet, it's like it doesn't actually disconnect
@jhuckaby i will check the memory tomorrow morning and play what i found here
@jhuckaby its not a memory issue
[1650306586.418][2022-04-18 18:29:46][crons.zukk.in][441580][Error][error][job][Failed to fetch job log file: http://172.31.60.150:3012/api/app/fetch_delete_job_log?path=%2Fopt%2Fcronicle%2Flogs%2Fjobs%2Fjl251w6er9c.log&auth=38b937b3eeef304e302013184d86ab7b39bb6845d6a8a64e2dd0b49a78de7ffb: Error: Socket Timeout (30000 ms)][]
[1650306593.336][2022-04-18 18:29:53][crons.zukk.in][441580][Error][error][server][Slave connection failed: crons-worker.zukk.in: Error: timeout][]
@jhuckaby Error.log
@jhuckaby thats the frist time that pass by 7 days:
@jhuckaby i verified on aws and cloudwatch doest send any alert about server fail
Log on slave: WebServer.log:
[1650306571.02][2022-04-18 18:29:31][crons-worker.zukk.in][4502][WebServer][error][socket][Socket closed unexpectedly: c1049018][{"id":"c1049018","proto":"http","port":3012,"time_start":1650306434523,"num_requests":0,"bytes_in":0,"bytes_out":0,"aborted":true,"total_elapsed":136496,"url":"http://xxx.xxx.xxx.xxx:3012/api/app/fetch_delete_job_log?path=%2Fopt%2Fcronicle%2Flogs%2Fjobs%2Fjl251tzn28e.log&auth=90d41b9d5dcaa1709a3dd21706dbf65da29cee5199a4f239da841afd953314c3","ips":["xxx.xxx.xxx.xxx"],"req_id":"r1049034"}]
I disables both ( master and slave ) and after start it again, is it geting me this error
Mon Apr 18 2022 18:58:40 GMT+0000 (Coordinated Universal Time) - crons-worker.zukk.in - PID 1193
RangeError: Maximum call stack size exceeded
at debug (/opt/cronicle/node_modules/debug/src/common.js:68:15)
at Socket.sendPacket (/opt/cronicle/node_modules/engine.io/build/socket.js:372:13)
at Socket.write (/opt/cronicle/node_modules/engine.io/build/socket.js:351:14)
at Client.writeToEngine (/opt/cronicle/node_modules/socket.io/dist/client.js:171:23)
at Client._packet (/opt/cronicle/node_modules/socket.io/dist/client.js:160:14)
at Socket.packet (/opt/cronicle/node_modules/socket.io/dist/socket.js:179:21)
at Socket.emit (/opt/cronicle/node_modules/socket.io/dist/socket.js:97:14)
at constructor.masterSocketEmit (/opt/cronicle/lib/engine.js:326:12)
at constructor.uploadJobLog (/opt/cronicle/lib/job.js:1209:32)
at /opt/cronicle/lib/engine.js:765:14