Cronicle icon indicating copy to clipboard operation
Cronicle copied to clipboard

Web dashboard fails to load -- Network Error: 500, Request timed out

Open yanicakj opened this issue 4 years ago • 4 comments

Summary

When trying to visit Cronicle web dashboard in browser (chrome), it sporadically gives Network Error 500: request timed out warning bar at the top. In order to get around this, the page is refreshed until this error goes away (can take very long time).

Visiting http://ipaddress:3012/#Home ::

image

This seems to happen at random and it can persist for a very long time. Some days it will load with no problems very quickly. Other days it can continuously give this error for hours.

It seems that after enough refreshes, the dashboard will load correctly, but then another refresh a few seconds later can cause the 500 error loop.

Steps to reproduce the problem

Please see "Your Setup" section for important information on why this may be happening.

Step 1: Run Cronicle, submit a few jobs to be executed soon Step 2: Navigate to http://ipaddress:3012/#Home

Your Setup

Our setup includes 2 parts. The 1st part is Cronicle running on our AWS EC2 instance of size m5.Large.

It is a single instance of Cronicle, only 1 server, the master server. It is all running on this single instance.

The 2nd part of our setup includes a python program that hits the Cronicle REST API very often:

  1. get_event is hit every .5 seconds
  2. get_schedule is hit every ~minute
  3. create_event & delete_event are hit at most 10 times/minute

This python program is running on the same EC2 instance. It is essentially scraping a calendar & consistently checking if the calendar's meetings are present in Cronicle. If not, it creates the meetings in Cronicle as jobs. If meetings are deleted from calendar, it deletes meetings from Cronicle.

These requests are all synchronous and never execute at the same time, they all happen in an order and wait for each other to finish.

Operating system and version?

NAME="Amazon Linux AMI" VERSION="2018.03" ID="amzn" ID_LIKE="rhel fedora" VERSION_ID="2018.03" PRETTY_NAME="Amazon Linux AMI 2018.03"

Node.js version?

v10.19.0

Cronicle software version?

Version 0.8.46

Are you using a multi-server setup, or just a single server?

Single server

Are you using the filesystem as back-end storage, or S3/Couchbase?

Filesystem

Can you reproduce the crash consistently?

Sort-of?

Log Excerpts

[1595629150.414][2020-07-24 22:19:10][ip-54-40-19-139][30218][WebServer][debug][9][Request performance metrics:][{"scale":1000,"perf":{"total":997.916,"read":0.297,"process":129.971,"write":867.571},"counters":{"bytes_in":288,"bytes_out":178,"num_requests":1}}] [1595629150.414][2020-07-24 22:19:10][ip-54-40-19-139][30218][WebServer][debug][9][Keeping socket open for keep-alives: c83][] [1595629150.418][2020-07-24 22:19:10][ip-54-40-19-139][30218][WebServer][debug][8][HTTP connection has closed: c83][{"ip":"::ffff:54.40.16.8","total_elapsed":1003,"num_requests":1,"bytes_in":288,"bytes_out":178}] [1595629153.619][2020-07-24 22:19:13][ip-54-40-19-139][30218][WebServer][debug][8][HTTP connection has closed: c80][{"ip":"::ffff:54.31.48.215","total_elapsed":10637,"num_requests":1,"bytes_in":401,"bytes_out":5698}] [1595629153.62][2020-07-24 22:19:13][ip-54-40-19-139][30218][WebServer][error][socket][Socket connection terminated unexpectedly during response][{"ips":["54.31.48.215"],"useragent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36","referrer":"http://54.40.16.8:3012/","cookie":"","url":"http://54.40.16.8:3012/api/user/resume_session"}] [1595629153.688][2020-07-24 22:19:13][ip-54-40-19-139][30218][WebServer][debug][9][Compressed text output with gzip: 70348812 bytes down to: 19269171 bytes][] [1595629153.689][2020-07-24 22:19:13][ip-54-40-19-139][30218][WebServer][debug][9][Sending compressed HTTP response with gzip: 200 OK][{"Content-Type":"application/json","Access-Control-Allow-Origin":"*","Server":"Cronicle 1.0","Content-Length":19269171,"Content-Encoding":"gzip"}]

yanicakj avatar Jul 24 '20 23:07 yanicakj

That error comes from UI, not from server. I guess your server is just overwhelmed when you see that error. Is it pulling get_history every 500 ms or you meant 5 seconds? Wondering how many events are scheduled?

mikeTWC1984 avatar Jul 25 '20 01:07 mikeTWC1984

@mikeTWC1984 Thanks for the reply. The server being overwhelmed was our guess too, good point, makes sense.

It is pulling the get_history/get_schedule every 5 seconds-ish.

Usually there can be around 20 jobs with 30-60 minutes wait times pending to be executed.

Also including the details here in-case any of the numbers look alarming:

image

yanicakj avatar Jul 26 '20 03:07 yanicakj

OK. 5 seconds is not that bad. Do you think that history polling is killing it, or it would happen regardless? do you set any limit to the history request? I guess you likely create to many concurrent reads/writes for your disc (especially if your events reading/writing to the disc). Check what kind of disc your are using (maybe it's some hdd) and if you can upgrade it or switch to s3. You may also try to use Redis as your storage if you have enough memory If you believe that's the polling job causing this, you can probably try to use websockets to trigger it. If you run it every 5 seconds just to emulate realtime and actual state change occurs just few time a minute that might help.

mikeTWC1984 avatar Jul 26 '20 04:07 mikeTWC1984

What was the solution to this problem we are facing similar a similar issue with the get_history api timing out.

keenkeystrokes avatar Jul 19 '22 09:07 keenkeystrokes