wicked_pdf icon indicating copy to clipboard operation
wicked_pdf copied to clipboard

PDF only embeds first 19 external images, then blanks. No error present

Open jathayde opened this issue 1 year ago • 3 comments

Issue description

Generating a PDF of many cloth patches (collector resource), I end up with the cover image and the first 18 items rendering correctly, and the rest (which are all the same partial as the first 18) not rendering the image (but rendering the partial/CSS/HTML). Images are included using a method based off this comment here: https://github.com/mileszs/wicked_pdf/issues/36#issuecomment-91027880

My full method:

  require 'open-uri'
  def embed_remote_image(url, content_type)
    asset = URI.open(url, "r:UTF-8", &:read)
    base64 = Base64.encode64(asset.to_s).gsub(/\s+/, "")
    "data:#{content_type};base64,#{Rack::Utils.escape(base64)}"
  rescue OpenURI::HTTPError => e
    if e.message == '404 Not Found'
      Rails.logger.debug "Missing file"
    elsif e.message == '403 Forbidden'
      Rails.logger.debug "Forbidden file"
    else
      Rails.logger.debug { "HTTP Error: #{e.message}" }
    end
  end

The PDF build is being triggered through a sidekiq process, and the cover is built separately, with the same image command, and merged together at the end. Images are on S3, with Cloudfront in front of it. They render fine on the web version of the page. header and footer are also included PDF files. ulimit is unlimited on both dev (macOS) and prod (Ubuntu 20.04.6 LTS "Focal")

Expected or desired behavior

All of the images would render correctly.

System specifications

wicked_pdf gem version (output of cat Gemfile.lock | grep wicked_pdf): 2.6.4

wkhtmltopdf version (output of wkhtmltopdf --version): 0.12.6 (with patched qt)

whtmltopdf provider gem and version if one is used: wkhtmltopdf-binary 0.12.6.6 (uses Heroku version 2.12.6.0 in production, with similar results)

platform/distribution and version (e.g. Windows 10 / Ubuntu 16.04 / Heroku cedar): This is running on macOS Sonoma, Ruby 3.2.2 (2023-03-30 arm64-darwin22) and Rails 7.1.2. Production is Ubuntu 20.04.6LTS but via dokku instances.

jathayde avatar Dec 09 '23 19:12 jathayde

Have you tried to render the same image more than 18 times (have you tried to debug URI.read return?)? I've tried to render base64 images and it can render 500 easily.

dmitry avatar Jan 26 '24 11:01 dmitry

I suspect your issue is that the image has to load from a remote source multiple times, which is longer than the timeout to get assets, so wkhtmltopdf gives up.

You can try adjusting the timeout, or using the window_status setting. Though, if the images are the same repeated, I'd suggest caching them so they don't have to re-download every time.

Maybe something like this (untested, but I'm sure you'll see where I'm going):

def embed_remote_image(url, content_type)
  # Setup data store to memoize assets already downloaded.
  @assets ||= @assets.presence || {}
    
  # Return early from cache if already cached.
  return @assets[url] if @assets[url].present?

  asset = URI.open(url, "r:UTF-8", &:read)
  base64 = Base64.encode64(asset.to_s).gsub(/\s+/, "")
  result = "data:#{content_type};base64,#{Rack::Utils.escape(base64)}"

  # Cache { url: result } to @assets hash for later requests to the same URL.
  @assets[url] = result

  result
rescue OpenURI::HTTPError => e
  if e.message == '404 Not Found'
    Rails.logger.debug "Missing file"
  elsif e.message == '403 Forbidden'
    Rails.logger.debug "Forbidden file"
  else
    Rails.logger.debug { "HTTP Error: #{e.message}" }
  end
end

unixmonkey avatar Jan 26 '24 17:01 unixmonkey

Interesting follow up - I've been trying to work on this intermittently. A real-time render on a similar PDF on the site will do hundreds of images no problem. Only when it kicks to a background task does this happen. So somewhere in either the controller or the task, this is failing silently. There's nothing glaring in console (but it also runs by a mile a minute pulling everything in). I'm guessing something is breaking in the background task, and not with the image fetching itself.

jathayde avatar Mar 28 '24 17:03 jathayde