puppeteer-ruby icon indicating copy to clipboard operation
puppeteer-ruby copied to clipboard

Memory leak in rails apps using Puppeteer.connect(browser_ws_endpoint: '...') do |browser| .. end

Open preston opened this issue 1 year ago • 8 comments

Step To Reproduce / Observed behavior

Using puppeteer in a Rails 7.2.1 application with external browserless Chrome container connection with Puppeteer.connect(..) do |browser| .. end. Memory usage slowly creeps up. When built into a Docker image, any hard limit will eventually be hit despite the ruby VM trying to garbage collect. I am "ensure"ing a browser.close and browser.disconnect within the block. Here's the exact block...

      Puppeteer.connect(browser_ws_endpoint: ENV['WEBSOCKET_CHROME_URL']) do |browser|
        Rails.logger.debug "Attempting to capture screenshot of: + #{uri}"
        begin
          page = browser.new_page
          page.viewport = Puppeteer::Viewport.new(width: 1280, height: 1280)
          page.goto(uri.to_s, timeout: 5000) # , wait_until: 'domcontentloaded')
          self.http_screenshot = page.screenshot
        rescue StandardError => e
          # Errors can be thrown due to a number of things: DNS, timeout, etc.
          Rails.logger.debug 'Failed to capture screenshot.'
          Rails.logger.debug e
        ensure
          Rails.logger.debug 'Closing browser.'
          browser.close
          browser.disconnect
        end
      end

Expected behavior

Memory to remain fairly stable.

Environment

macOS with rvm

Paste the output of ruby --version

ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [arm64-darwin23]

preston avatar Oct 23 '24 21:10 preston

Also note that the method running this is within an ActiveRecord model class. I don't think that should matter.. unless it does. :)

preston avatar Oct 23 '24 21:10 preston

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 26 '25 05:04 stale[bot]

This is still an big issue.

preston avatar Apr 29 '25 18:04 preston

I have the same issue with a very similar setup except using Puppeteer.launch

Puppeteer.launch(headless: headless, args: args) do |browser|
  page = browser.pages.first || browser.new_page

  # Puppeteer logic
rescue => e
 # timeout issues, etc. 
ensure
  browser.pages.each { |pg| pg.close unless pg.closed? }

  # Other cleanup handled by the Puppeteer.launch ensure block
end

I'm using ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [aarch64-linux]

calebstclair avatar May 19 '25 13:05 calebstclair

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 27 '25 03:06 stale[bot]

I'm having the same issue

CodyWatters avatar Jun 29 '25 15:06 CodyWatters

I was able to work around this issue by using a child process (not ideal but it works). I run my puppeteer code in a delayed job process that runs inside a docker container. This approach will put the memory leaking issue into an isolated process so it can be properly cleaned up when it finishes.

  def using_puppeteer(headless: true, args: DEFAULT_PUPPET_ARGS)
    file = Tempfile.new(SecureRandom.hex(10).to_s, Rails.root.join('tmp'))

    pid = Process.fork do
      Puppeteer.launch(headless: headless, args: args) do |browser|
        page = browser.pages.first || browser.new_page

        # Do whatever you need with puppeteer using a block
        puppet_result = yield(browser, page)

        file.write(puppet_result.to_json) if puppet_result.present? && puppet_result.respond_to?(:to_json)
      rescue => e
        # Do whatever on exception

        # Store exception so it can be given to the worker process
        file.write("Exception: #{e.message}")
      ensure
        file.flush

        # Do any cleanup operations on the page

        browser.close

        file.close unless file.closed?

        exit(0)
      end
    end

    Process.wait(pid)

    # Use the tempfile in the main process to handle whatever was returned by the puppeteer process
    file.rewind

    # Get whatever the puppet process returned, if anything
    result_from_puppet = file.read

    return if result_from_puppet.blank?

    raise(result_from_puppet.split('Exception: ').last) if result_from_puppet.include?('Exception: ')

    JSON.parse(result_from_puppet)
  end

Then it can be used like

      using_puppeteer(headless: headless) do |_browser, page|
        # Do whatever you need with puppeteer. Memory will be cleaned up after use
      end

calebstclair avatar Jun 30 '25 12:06 calebstclair

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 19 '25 03:07 stale[bot]