WeasyPrint icon indicating copy to clipboard operation
WeasyPrint copied to clipboard

Occasional image error (connection reset) breaks whole document

Open cocorossello opened this issue 8 months ago • 1 comments

We are using external images in our pdfs. Those external images sometimes give timeouts, or other types of errors.

Timeouts are already handled by weasyprint, the problem is that in a very few occasional errors, the url throws a connection reset.

I can't provide a reproducer, since this is a very occasional error, but the stacktrace is very clear:

Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/flask/app.py", line 1511, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.13/site-packages/flask/app.py", line 919, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.13/site-packages/flask/app.py", line 917, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.13/site-packages/flask/app.py", line 902, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/usr/src/app/wsgi.py", line 54, in checkauth
    return f(*args, **kwargs)
  File "/usr/src/app/wsgi.py", line 114, in generate
    pdf = html.write_pdf(optimize_images=True, jpeg_quality=60)
  File "/usr/local/lib/python3.13/site-packages/weasyprint/__init__.py", line 265, in write_pdf
    self.render(font_config, counter_style, **options)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/weasyprint/__init__.py", line 222, in render
    return Document._render(self, font_config, counter_style, options)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/weasyprint/document.py", line 246, in _render
    root_box = build_formatting_structure(
        html.etree_element, context.style_for, context.get_image_from_uri,
        html.base_url, context.target_collector, counter_style,
        context.footnotes)
  File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 67, in build_formatting_structure
    box_list = element_to_box(
        element_tree, style_for, get_image_from_uri, base_url,
        target_collector, counter_style, footnotes)
  File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 190, in element_to_box
    child_boxes = element_to_box(
        child_element, style_for, get_image_from_uri, base_url,
        target_collector, counter_style, footnotes, state)
  File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 190, in element_to_box
    child_boxes = element_to_box(
        child_element, style_for, get_image_from_uri, base_url,
        target_collector, counter_style, footnotes, state)
  File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 190, in element_to_box
    child_boxes = element_to_box(
        child_element, style_for, get_image_from_uri, base_url,
        target_collector, counter_style, footnotes, state)
  [Previous line repeated 8 more times]
  File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 258, in element_to_box
    return html.handle_element(element, box, get_image_from_uri, base_url)
           ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/weasyprint/html.py", line 72, in handle_element
    return HTML_HANDLERS[element.tag](
           ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        element, box, get_image_from_uri, base_url)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/weasyprint/html.py", line 117, in handle_img
    image = get_image_from_uri(
        url=src, orientation=box.style['image_orientation'])
  File "/usr/local/lib/python3.13/site-packages/weasyprint/images.py", line 310, in get_image_from_uri
    string = result['file_obj'].read()
  File "/usr/local/lib/python3.13/http/client.py", line 495, in read
    s = self._safe_read(self.length)
  File "/usr/local/lib/python3.13/http/client.py", line 642, in _safe_read
    data = self.fp.read(amt)
  File "/usr/local/lib/python3.13/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
           ~~~~~~~~~~~~~~~~~~~~^^^
  File "/usr/local/lib/python3.13/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
           ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer

I see that only these exceptions are handled:

    except (URLFetchingError, ImageLoadingError) as exception:
        LOGGER.error('Failed to load image at %r: %s', url, exception)
        LOGGER.debug('Error while loading image:', exc_info=exception)
        image = None

Is there any reason to not catch simply Exception? I have changed it locally and works fine

Thanks.

cocorossello avatar Apr 29 '25 12:04 cocorossello

Hi!

Thanks for the bug report. You’re right, there’s a problem here.

I think that having a custom URLFetchingError is a good idea, and keeping the error management in fetch is a better idea if we want to solve this problem for images and everywhere else fetch is used (CSS, attachments…).

The problem here is that fetch returns a file object that will raise an error when read, not before being returned. It looks like the data will always be read at the end, and that there’s no real reason to delay the call to read after calling fetch. It would also simplify the code in fetch contexts.

liZe avatar May 03 '25 06:05 liZe