solid_queue icon indicating copy to clipboard operation
solid_queue copied to clipboard

Recurring schedule after an hour did not work anymore

Open lrjbrual opened this issue 11 months ago • 5 comments

Hi,

We have current setup in DigitalOcean to run an API to send data to another third party app. and it success processing the jobs during 1 to 2 hours, We are using DigitalOcean App platform container.

Here is our setup of recurring.yml

production:
  salesforce_invoice_sync:
    class: SalesforceInvoiceSync::InvoiceSyncJob
    schedule: every 10 minutes

development:
  salesforce_invoice_sync:
    class: SalesforceInvoiceSync::InvoiceSyncJob
    schedule: every hour

However after one hour to two hours the jobs does not work anymore: here is the screenshot: image the finished job, after there is no: image

Here is pick high memory use: image

Our logs.

Dec 06 13:41:48 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801806248485232640&selected=1801806248485232640) [InnovationFunding](https://my.papertrailapp.com/groups/39357550/events?q=program%3AInnovationFunding&focus=1801806248485232640&selected=1801806248485232640) I, [2024-12-06T21:41:48.038204 #43]  INFO -- :   Parameters: {"server_id"=>"solid_queue", "application_id"=>"innovationfunding", "status"=>"finished"}
Dec 06 13:41:48 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801806248694931456&selected=1801806248694931456) [InnovationFunding](https://my.papertrailapp.com/groups/39357550/events?q=program%3AInnovationFunding&focus=1801806248694931456&selected=1801806248694931456) I, [2024-12-06T21:41:48.093696 #43]  INFO -- : Completed 200 OK in 53ms (Views: 23.0ms | ActiveRecord: 19.1ms (67 queries, 52 cached) | GC: 10.0ms)
Dec 06 13:41:48 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801806248715911175&selected=1801806248715911175) [InnovationFunding](https://my.papertrailapp.com/groups/39357550/events?q=program%3AInnovationFunding&focus=1801806248715911175&selected=1801806248715911175) I, [2024-12-06T21:41:48.090483 #43]  INFO -- :   Rendered layout layouts/mission_control/jobs/application.html.erb (Duration: 28.0ms | GC: 10.0ms)
Dec 06 13:44:42 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801806978533203969&selected=1801806978533203969) [SyncHubWorker](https://my.papertrailapp.com/groups/39357550/events?q=program%3ASyncHubWorker&focus=1801806978533203969&selected=1801806978533203969) D, [2024-12-06T21:44:42.095384 #598] DEBUG -- : SolidQueue-1.0.2 Prune dead processes (5.6ms)  size: 0
Dec 06 13:49:42 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801808237189308477&selected=1801808237189308477) [SyncHubWorker](https://my.papertrailapp.com/groups/39357550/events?q=program%3ASyncHubWorker&focus=1801808237189308477&selected=1801808237189308477) D, [2024-12-06T21:49:42.182049 #598] DEBUG -- : SolidQueue-1.0.2 Prune dead processes (76.9ms)  size: 0
Dec 06 13:49:42 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801808239529730051&selected=1801808239529730051) [SyncHubWorker](https://my.papertrailapp.com/groups/39357550/events?q=program%3ASyncHubWorker&focus=1801808239529730051&selected=1801808239529730051) D, [2024-12-06T21:49:42.741373 #608] DEBUG -- : SolidQueue-1.0.2 Unblock jobs (1.3ms)  limit: 500, size: 0
Dec 06 13:54:42 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801809495778295809&selected=1801809495778295809) [SyncHubWorker](https://my.papertrailapp.com/groups/39357550/events?q=program%3ASyncHubWorker&focus=1801809495778295809&selected=1801809495778295809) D, [2024-12-06T21:54:42.253818 #598] DEBUG -- : SolidQueue-1.0.2 Prune dead processes (65.0ms)  size: 0
Dec 06 13:59:42 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801810754165972992&selected=1801810754165972992) [SyncHubWorker](https://my.papertrailapp.com/groups/39357550/events?q=program%3ASyncHubWorker&focus=1801810754165972992&selected=1801810754165972992) D, [2024-12-06T21:59:42.274625 #598] DEBUG -- : SolidQueue-1.0.2 Prune dead processes (8.9ms)  size: 0
Dec 06 13:59:42 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801810756204388357&selected=1801810756204388357) [SyncHubWorker](https://my.papertrailapp.com/groups/39357550/events?q=program%3ASyncHubWorker&focus=1801810756204388357&selected=1801810756204388357) D, [2024-12-06T21:59:42.767003 #608] DEBUG -- : SolidQueue-1.0.2 Unblock jobs (4.2ms)  limit: 500, size: 0
Dec 06 13:59:47 [innovation-funding-7fd67954d9-qln2z](https://my.papertrailapp.com/systems/innovation-funding-7fd67954d9-qln2z/events?focus=1801810776764866566&selected=1801810776764866566) [InnovationFunding](https://my.papertrailapp.com/groups/39357550/events?q=program%3AInnovationFunding&focus=1801810776764866566&selected=1801810776764866566) I, [2024-12-06T21:59:47.668392 #43]  INFO -- : Started GET "/jobs/applications/innovationfunding/finished/jobs?server_id=solid_queue" for [172.71.98.110](https://my.papertrailapp.com/groups/39357550/events?q=%22172.71.98.110%22&focus=1801810776764866566&selected=1801810776764866566) at 2024-12-06 21:59:47 +0000

in Addition, when using also puma, to add: bin/jobs start image

I am not sure what is going on, why the jobs stop. or do we need to add a worker? But on the other app, we are using a good job,. We are using web only in DigitalOcean, not with worker, but it seems to be working, and it is a heavy load of APIs.

Thanks you in advance for assistance and help.

lrjbrual avatar Dec 06 '24 23:12 lrjbrual

Hey @lrjbrual, sorry for the delay! It's odd that the job stops being scheduled. When this happens, can you check which processes are running in the server where you're running solid_queue? Just running something like

ps axl | grep solid

rosa avatar Dec 09 '24 18:12 rosa

ps axl | grep solid

Hi @rosa, No, problem. Appreciate your reply. I am using DigitalOcean App platform, using web resource without worker.

here are the log information:

0  1000    59    42  20   0 836564 228588 ?     Sl   ?          0:05 solid-queue-supervisor(1.1.0): supervising 71, 74, 78
0  1000    71    59  20   0 837140 228868 ?     Sl   ?          0:09 solid-queue-dispatcher(1.1.0): dispatching every 1 seconds
0  1000    74    59  20   0 839444 232840 ?     Sl   ?          0:11 solid-queue-worker(1.1.0): waiting for jobs in invoice_jobs
0  1000    78    59  20   0 839124 230888 ?     Sl   ?          0:04 solid-queue-scheduler(1.1.0): scheduling salesforce_invoice_sync
0  1000   155   146  20   0  11812  5080 ?      S    ?          0:00 grep solid

After I revalidate again:

rails@innovation-funding-5cff94d599-qsw5r:/rails$ ps axl | grep solid
0  1000   221   157  20   0  11812  4628 ?      S    ?          0:00 grep solid

my jobs stop again: image

It skipping the schedule of 19:50 and for sure the same again: image

lrjbrual avatar Dec 09 '24 19:12 lrjbrual

Huh, what did it happen between this

0  1000    59    42  20   0 836564 228588 ?     Sl   ?          0:05 solid-queue-supervisor(1.1.0): supervising 71, 74, 78
0  1000    71    59  20   0 837140 228868 ?     Sl   ?          0:09 solid-queue-dispatcher(1.1.0): dispatching every 1 seconds
0  1000    74    59  20   0 839444 232840 ?     Sl   ?          0:11 solid-queue-worker(1.1.0): waiting for jobs in invoice_jobs
0  1000    78    59  20   0 839124 230888 ?     Sl   ?          0:04 solid-queue-scheduler(1.1.0): scheduling salesforce_invoice_sync
0  1000   155   146  20   0  11812  5080 ?      S    ?          0:00 grep solid

and this?

After I revalidate again:

rails@innovation-funding-5cff94d599-qsw5r:/rails$ ps axl | grep solid
0  1000   221   157  20   0  11812  4628 ?      S    ?          0:00 grep solid

As there, solid queue is no longer running so that's the reason the job is not being enqueued, but any ideas of what happened?

rosa avatar Dec 09 '24 19:12 rosa

Huh, what did it happen between this

0  1000    59    42  20   0 836564 228588 ?     Sl   ?          0:05 solid-queue-supervisor(1.1.0): supervising 71, 74, 78
0  1000    71    59  20   0 837140 228868 ?     Sl   ?          0:09 solid-queue-dispatcher(1.1.0): dispatching every 1 seconds
0  1000    74    59  20   0 839444 232840 ?     Sl   ?          0:11 solid-queue-worker(1.1.0): waiting for jobs in invoice_jobs
0  1000    78    59  20   0 839124 230888 ?     Sl   ?          0:04 solid-queue-scheduler(1.1.0): scheduling salesforce_invoice_sync
0  1000   155   146  20   0  11812  5080 ?      S    ?          0:00 grep solid

and this?

After I revalidate again:

rails@innovation-funding-5cff94d599-qsw5r:/rails$ ps axl | grep solid
0  1000   221   157  20   0  11812  4628 ?      S    ?          0:00 grep solid

As there, solid queue is no longer running so that's the reason the job is not being enqueued, but any ideas of what happened?

That is what I am looking for, why it is happening. I added honeybadger to revalidate. it seems after 3 times attempt it clear up the solid queue, cannot find what is going on, yet.

@rosa, do you also know how to clean up some continuous connections to the database? for example instead of creating a new connection it will re-use the old connection. and I found it connects multiple times, and I have limited the database connection to 22. I'm not sure how to deal with clean up the connection or re-use it; I'm still exploring the solid queue documentation.

I will continue to to monitor until tomorrow if I have still an issue with running a solid queue. I rerun again the bin/jobs start and it is almost 2 and hop hours and still running; I Hope it will continue without issues.

lrjbrual avatar Dec 09 '24 20:12 lrjbrual

Hey @lrjbrual, so sorry for the delay replying! Somehow I missed the notification for your last comment 🤦‍♀️

You don't need to manually clean up any DB connection, Solid Queue relies on Active Record for that.

I have limited the database connection to 22

Where do you have limited this? In Rails's database.yml configuration or in the database itself?

rosa avatar Jan 03 '25 20:01 rosa