solid_queue icon indicating copy to clipboard operation
solid_queue copied to clipboard

Need advice on Solid Queue's memory usage

Open yjchieng opened this issue 1 year ago • 53 comments

Ruby: 3.3.4 Rails: 7.2.1 Solid Queue: 0.7.0, 0.8.2


I run a Rails App on AWS EC2 instance with 1G of memory. I notice the solid queue process takes up 15-20% of the instance's memory, which becomes the single largest process by memory usage.


What I checked:

  1. Check memory usage by start/stop supervisorctl (I use it to manage my solid queue process)

stop supervisorctl - free memory 276MB start supervisorctl - free memory 117MB

It increases 159MB

  1. Stop supervisorctl service, and run "solid_queue:start" directly Trying to see if this is something related to supervisor.

before solid_queue:start - free memory 252MB after solid_queue:start - free memory 109MB

It increases 143MB

  1. Then I notice there is a latest version. I upgraded to 0.8.2 (was 0.7.0).

stop supervisorctl - free memory 220MB start supervisorctl - free memory 38MB

It increases 182MB


I need some advise:

  1. Is 150-200MB the minimum requirement to run "solid_queue:start"?
  2. Is there any setting/feature that I can switch off to reduce memory usage?
  3. Is there any setting that I can limit the maximum memory usage?

And, thanks a lot for making this wonderful gem. :)

yjchieng avatar Sep 07 '24 06:09 yjchieng

Hey @yjchieng, thanks for opening this issue! 🙏 I think it depends a lot on your app. A brand new Rails app seems to use around 74.6MB memory for me after booting (without Solid Queue, just running Puma). I think the consumption you're seeing is from all the processes together and not just the supervisor, measuring free memory before starting the supervisor and after, as the supervisor will fork more processes. Are you running multiple workers or just one? I think reducing the number of workers there would help. Another thing that might help is using bin/jobs, which preloads the whole app before forking, but the gains there are usually quite modest.

rosa avatar Sep 07 '24 10:09 rosa

There might also be something else going on because the only changes from version 0.7.0 to 0.8.2 were for the installing part of Solid Queue; nothing was changed besides the initial installation, so the memory footprint shouldn't have changed. I imagine there is other stuff going on in your AWS instance at the same time that might consuming memory as well.

rosa avatar Sep 07 '24 10:09 rosa

Up 🆙🔥

I have huge memory issues in production (Rails 7.2 + activeJob + solidQueue). Everything works just fine in dev mode, but in production, there seems to be a memory leak. After restarting my production server, I get to roughly ~75% RAM usage. Very quickly (talking in minutes...) I get to ~100%. And if I let the app run for the weekend and come back on Monday (like today), I get to... 288% RAM usage... I tried removing all the lines in my code related to solidQueue, and I can confirm that this is what's causing the memory issue in production.

The exact error codes I'm getting, causing my app to crash in production (Heroku), are R14 and R15.

Any advice/suggestions would be very much appreciated fellow devs. Have an amazing day!

Focus-me34 avatar Oct 28 '24 19:10 Focus-me34

@Focus-me34, what version of Solid Queue are you running? And when you say you're removing anything related to Solid Queue, what Active Job adapter are you using instead?

rosa avatar Oct 28 '24 19:10 rosa

@rosa I'm using Solid Queue version 1.0.0. I checked all sub-dependencies versions. They all respect the pre-requirements. We didn't really try with any other adapter since Solid Queue will be the default adapter in RoR 8. Hence, we really want to make it work this way.

Here's some of our setup code:

# scrape_rss_feed_job.rb
class ScrapingJob < ApplicationJob
  queue_as :default
  limits_concurrency to: 1, key: -> { "rss_feed_job" }, duration: 1.minute

  def perform
    Api::V1::EntriesController.fetch_latest_entries
  end
end
# recurring.yml
default: &default
  periodic_cleanup:
    class: ScrapeSecRssFeedJob
    schedule: every 2 minutes

development:
  <<: *default

test:
  <<: *default

production:
  <<: *default
# queue.yml
default: &default
  dispatchers:
    - polling_interval: 1
      batch_size: 500
      concurrency_maintenance_intervaL: 15
  workers:
    - queues: "*"
      threads: 3
      processes: <%= ENV.fetch("JOB_CONCURRENCY", 1) %>
      polling_interval: 0.1

development:
  <<: *default

test:
  <<: *default

production:
  <<: *default

Do you see anything weird?

Focus-me34 avatar Oct 29 '24 09:10 Focus-me34

No, that looks good to me re: configuration. You said:

I tried removing all the lines in my code related to solidQueue, and I can confirm that this is what's causing the memory issue in production.

So, if you don't get any memory issues when not using Solid Queue, is that because you're not running any jobs at all, if you're not using another adapter? Because that would point to the jobs having a memory leak, not Solid Queue.

rosa avatar Oct 29 '24 09:10 rosa

Hey Rosa, sorry for the delayed reply! I've been very busy at work.

Here’s where we're at: I've been working on getting our company’s code running smoothly with Rails 7.2 and Solid Queue (in production Heroku). As I mentioned earlier, it’s been a huge challenge, and unfortunately, we haven’t had much success with it.

My colleague and I decided to take a closer look at our code to see if the problem was on our end. Since my last comment here, we’ve implemented tests, and I can confirm that the code is behaving exactly as expected.

Our next step, after troubleshooting the high memory usage on Heroku, was to try switching away from Solid Queue and try a different job adapter (as you suggested). I've set up Sidekiq as the adapter, and we saw a drastic improvement: memory usage dropped from around 170% of our 512 MB quota to a range of 25%-70%.

This leads me to believe that there might be a memory leak in production when using Solid Queue. From our observations, it seems that after the initial job execution completes, instance variables at the top of the function (which should reset to nil at the start of each job) are retaining the values from the previous iteration. We suspect this might be preventing the Garbage Collector from clearing memory properly between jobs.

Let me know if there's any more information I can provide to help you investigate. We’re really looking forward to moving back to using the built-in Solid Queue functionality once this issue is resolved.

[Edit: The job we're running involves two main dependencies. We scrape an RSS feed using Nokogiri and fetch a URL for each entry using httparty]

Focus-me34 avatar Nov 07 '24 00:11 Focus-me34

I am observing high memory usage when solid queue is used on heroku as well. Has there been any solutions to fix this issue? is there any temporary fix that can be used for now?

rajeevriitm avatar Mar 21 '25 13:03 rajeevriitm

solid queue with puma plugin is stupidly using high memory.

arikarim avatar Mar 24 '25 11:03 arikarim

Hey @rajeevriitm, no, no real solutions. I had an async adapter that would run the supervisor, workers and dispatchers and everything together in the same process, which would save memory. However, this was scrapped from the 1.0 version. I need to push for that again.

In the meantime, you can try using rake solid_queue:start to run Solid Queue, as that, by default, won't preload all your app and see if that makes a difference. What's your configuration like?

@arikarim, thanks for your very helpful and useful comment 😒

rosa avatar Mar 24 '25 11:03 rosa

Haha sorry i was so tired of this. until i found out that i was running rails s instead of bundle exec puma. my bad

arikarim avatar Mar 24 '25 11:03 arikarim

i still there is some strange issues, with puma concurrency more than 0 the memory goes up 😢

arikarim avatar Mar 24 '25 12:03 arikarim

@rosa I have a simple configuration. queue runs on a single server config. I have a continuous running job that runs every 30 mins.

queue.yml

default: &default
  dispatchers:
    - polling_interval: 2
      batch_size: 500
  workers:
    - queues: "*"
      threads: 1
      processes: 1
      polling_interval: 2

development:
  <<: *default

test:
  <<: *default

production:
  <<: *default

puma.rb

threads_count = ENV.fetch("RAILS_MAX_THREADS", 3)
threads threads_count, threads_count
port ENV.fetch("PORT", 3000)
plugin :tmp_restart

# Run the Solid Queue supervisor inside of Puma for single-server deployments
plugin :solid_queue 

pidfile ENV["PIDFILE"] if ENV["PIDFILE"]

rajeevriitm avatar Mar 24 '25 20:03 rajeevriitm

@rosa there seems to be a problem with newer versions of solid queue, i have downgraded my solid gem to 1.1.0 and the memory issue is fixed. 😄

arikarim avatar Mar 25 '25 11:03 arikarim

+1, I'm running the latest solid_queue version and am definitely experiencing memory issues. Once I start the container, memory usage keeps increasing indefinitely until it hits the maximum capacity and the container restarts.

I also tried switching from bin/jobs to a Rake task, but that didn’t help either.

Note: I don't run any jobs.

default: &default
  dispatchers:
    - polling_interval: 1
      batch_size: 500
  workers:
    - queues: "*"
      threads: 3
      processes: <%= ENV.fetch("JOB_CONCURRENCY", 1) %>
      polling_interval: 0.1

development:
<<: *default

test:
<<: *default

production:
<<: *default

Image

IvanPakhomov99 avatar Mar 26 '25 23:03 IvanPakhomov99

hi 👋 I'm experiencing the same issue:

Ruby: 3.4.2 Rails: 8.0.2 Solid Queue: 1.1.4 Puma: plugin :solid_queue

Running on DigitalOcean droplet instance with 1G of memory. I'll try to downgrade the solid_queue gem version to see if improves

eliasousa avatar Mar 29 '25 16:03 eliasousa

Hey all, so sorry about this! I've been swamped with other stuff at work, but I'm going to look into this on Monday.

rosa avatar Mar 29 '25 22:03 rosa

Thanks for addressing the issue. Hope it's resolved soon.

On Sun, 30 Mar 2025, 3:59 am Rosa Gutierrez, @.***> wrote:

Hey all, so sorry about this! I've been swamped with other stuff at work, but I'm going to look into this on Monday.

— Reply to this email directly, view it on GitHub https://github.com/rails/solid_queue/issues/330#issuecomment-2764266561, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHHCXX5AO7SO53U3BY66UT2W4GCTAVCNFSM6AAAAABNZXT3UCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGI3DMNJWGE . You are receiving this because you were mentioned.Message ID: @.***> [image: rosa]rosa left a comment (rails/solid_queue#330) https://github.com/rails/solid_queue/issues/330#issuecomment-2764266561

Hey all, so sorry about this! I've been swamped with other stuff at work, but I'm going to look into this on Monday.

— Reply to this email directly, view it on GitHub https://github.com/rails/solid_queue/issues/330#issuecomment-2764266561, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHHCXX5AO7SO53U3BY66UT2W4GCTAVCNFSM6AAAAABNZXT3UCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGI3DMNJWGE . You are receiving this because you were mentioned.Message ID: @.***>

rajeevriitm avatar Mar 30 '25 08:03 rajeevriitm

@IvanPakhomov99, @rajeevriitm, could you try downgrading to version v1.1.2 and let me know if the issue persists? Also, what version of Ruby are you using?

rosa avatar Mar 31 '25 09:03 rosa

Hi @rosa, Thanks for your prompt response. I tested four different Solid Queue versions (1.1.0, 1.1.1, 1.1.2, and 1.1.4) but encountered the same issue across all of them.

Environment:

  • Ruby: 3.4.1
  • Rails: 8.0.1
  • Database: MySQL 8.0 (Solid Queue runs on the main instance)

Let me know if you need any additional details.

IvanPakhomov99 avatar Mar 31 '25 22:03 IvanPakhomov99

@rosa I tried downgrading. The issue exist in 1.1.2 as well.

Ruby 3.4.1
Rails 8.0.1

Happy to help .

rajeevriitm avatar Apr 01 '25 09:04 rajeevriitm

hey @rosa , were you able to identify the issue causing the memory leak?

rajeevriitm avatar Apr 03 '25 21:04 rajeevriitm

@rajeevriitm, I'm afraid I wasn't 😞 I reviewed all code from v1.1.0 and didn't identify anything that could leak memory. This was before @IvanPakhomov99 shared that testing 1.1.0, 1.1.1, 1.1.2, and 1.1.4 made no difference. Then, I tried running jobs of different kinds (recurring jobs being enqueued every 5 seconds of different types, long running jobs, etc.) over a couple of days and couldn't reproduce any memory leaks. I think whatever is happening depends on what your jobs are doing or what you're loading in your app. I also wonder if this is not a memory leak, but simply high memory usage. Solid Queue runs a different process for each worker, a process for the dispatcher, another process for the scheduler and another process for the supervisor. All those processes load your app and even though the fork is done after loading the app, so CoW should ensure memory sharing, this is not the same as running a single process.

I don't have a better idea to reduce memory usage other than providing a single-process execution mode (what I've called "async" mode).

rosa avatar Apr 04 '25 14:04 rosa

@rosa Thanks for taking the time to look into this! The main pitfall here is that it's a brand new service and I am not running any jobs yet. I also confirmed that the service pod memory is stable.

That said, I actually have an update — I downgraded to version 1.0.2, and memory usage looks stable now. I’ll stick with this version for now.

IvanPakhomov99 avatar Apr 04 '25 22:04 IvanPakhomov99

have the same issue on heroku to. with version 1.1.4.

as mentioned above. downgraded to version 1.0.2 solve the issue.

Image

chayuto avatar Apr 05 '25 22:04 chayuto

@rosa Thank you for your attentiveness to this issue. I just downgraded solid_queue to 1.0.2 and can confirm that it drastically reduced the R14 related errors/memory usage.

jeffcoh23 avatar Apr 07 '25 00:04 jeffcoh23

Oh! Thanks a lot for confirming that! I had only looked down to version 1.1.0. I'm on-call this week so a bit short on time but will try to figure out why the memory increased from that version to 1.1.0.

rosa avatar Apr 07 '25 15:04 rosa

@rosa I’m not sure how relevant this is, but it might be worth taking another look at this commit: https://github.com/rails/solid_queue/commit/a152f2637e69675612737e7da533ae7f2d4f092e. It looks like interruptible_sleep was updated to use Promises.future on each call. Given how this method is used — often multiple times within a loop — it’s quite possible this could lead to increased memory consumption.

Even though the block uses .value, making it synchronous from the caller’s perspective, a new thread or fiber is still created under the hood for each call.

Sorry if that’s not the case — I might not be seeing the full picture. Just reviewing the changes between versions 1.1.0 and 1.0.2.

IvanPakhomov99 avatar Apr 07 '25 21:04 IvanPakhomov99

@rosa Thank you for your attentiveness to this issue. I just downgraded solid_queue to 1.0.2 and can confirm that it drastically reduced the R14 related errors/memory usage.

Just to follow up on this - it seemed to have worked in the short-term, but I am back to where I started unfortunately... still seeing those r14/memory issues consistently.

jeffcoh23 avatar Apr 12 '25 01:04 jeffcoh23

We are experiencing the same unbounded memory growth issues with solid_queue with results in OOMs. We're on solid-queue 1.1.3 and Ruby 3.3.7.

We can recreate this issue by running bin/jobs and letting it run, without enqueuing any jobs.

solid-queue-worker (1.1.3)

Here's what the memory size of the solid-queue worker process solid-queue-worker(1.1.3): waiting for jobs in * looks like:

Image

It's being sampled every minute for about 75 minutes and grows from about 141mb to 840mb in that time. This was run in development with YJIT on and eager loading off. We have ran it with YJIT off and eager loading off, and the same issue persists.

solid-queue-supervisor (1.1.3)

Here's what the memory size of the solid-queue supervisor process solid-queue-supervisor(1.1.3) looks like over the same time frame:

Image

Memory here also seems to grow but at a much slower rate.

Memory growth

This issue persists in all environments regardless of yjit, eager loading, or seemingly other environmental factors. If left unchecked solid-queue will eventually hit OOM errors causing the pod/container it is running in to be killed.

The memory growth for the worker never seems to plateau. We have tried increasing memory and solid queue will just run until it consumes all memory. Because the worker grows at a much faster rate than the supervisor it's unclear if the supervisor will experienced unbounded memory growth. I suspect it will given that memory growth is occurring when there are no jobs present.

No jobs during this benchmark

Just wanting to call out that there are no jobs being enqueued at all. It's an empty database.

solid-queue 1.1.4 behavior is up next

I'm now running with solid-queue 1.1.4 and so far the issue appears exist there as well.

zdennis avatar Apr 20 '25 16:04 zdennis