Occasional image error (connection reset) breaks whole document
We are using external images in our pdfs. Those external images sometimes give timeouts, or other types of errors.
Timeouts are already handled by weasyprint, the problem is that in a very few occasional errors, the url throws a connection reset.
I can't provide a reproducer, since this is a very occasional error, but the stacktrace is very clear:
Traceback (most recent call last):
File "/usr/local/lib/python3.13/site-packages/flask/app.py", line 1511, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.13/site-packages/flask/app.py", line 919, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.13/site-packages/flask/app.py", line 917, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.13/site-packages/flask/app.py", line 902, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/usr/src/app/wsgi.py", line 54, in checkauth
return f(*args, **kwargs)
File "/usr/src/app/wsgi.py", line 114, in generate
pdf = html.write_pdf(optimize_images=True, jpeg_quality=60)
File "/usr/local/lib/python3.13/site-packages/weasyprint/__init__.py", line 265, in write_pdf
self.render(font_config, counter_style, **options)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/weasyprint/__init__.py", line 222, in render
return Document._render(self, font_config, counter_style, options)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/weasyprint/document.py", line 246, in _render
root_box = build_formatting_structure(
html.etree_element, context.style_for, context.get_image_from_uri,
html.base_url, context.target_collector, counter_style,
context.footnotes)
File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 67, in build_formatting_structure
box_list = element_to_box(
element_tree, style_for, get_image_from_uri, base_url,
target_collector, counter_style, footnotes)
File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 190, in element_to_box
child_boxes = element_to_box(
child_element, style_for, get_image_from_uri, base_url,
target_collector, counter_style, footnotes, state)
File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 190, in element_to_box
child_boxes = element_to_box(
child_element, style_for, get_image_from_uri, base_url,
target_collector, counter_style, footnotes, state)
File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 190, in element_to_box
child_boxes = element_to_box(
child_element, style_for, get_image_from_uri, base_url,
target_collector, counter_style, footnotes, state)
[Previous line repeated 8 more times]
File "/usr/local/lib/python3.13/site-packages/weasyprint/formatting_structure/build.py", line 258, in element_to_box
return html.handle_element(element, box, get_image_from_uri, base_url)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/weasyprint/html.py", line 72, in handle_element
return HTML_HANDLERS[element.tag](
~~~~~~~~~~~~~~~~~~~~~~~~~~^
element, box, get_image_from_uri, base_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/weasyprint/html.py", line 117, in handle_img
image = get_image_from_uri(
url=src, orientation=box.style['image_orientation'])
File "/usr/local/lib/python3.13/site-packages/weasyprint/images.py", line 310, in get_image_from_uri
string = result['file_obj'].read()
File "/usr/local/lib/python3.13/http/client.py", line 495, in read
s = self._safe_read(self.length)
File "/usr/local/lib/python3.13/http/client.py", line 642, in _safe_read
data = self.fp.read(amt)
File "/usr/local/lib/python3.13/socket.py", line 719, in readinto
return self._sock.recv_into(b)
~~~~~~~~~~~~~~~~~~~~^^^
File "/usr/local/lib/python3.13/ssl.py", line 1304, in recv_into
return self.read(nbytes, buffer)
~~~~~~~~~^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/ssl.py", line 1138, in read
return self._sslobj.read(len, buffer)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
I see that only these exceptions are handled:
except (URLFetchingError, ImageLoadingError) as exception:
LOGGER.error('Failed to load image at %r: %s', url, exception)
LOGGER.debug('Error while loading image:', exc_info=exception)
image = None
Is there any reason to not catch simply Exception? I have changed it locally and works fine
Thanks.
Hi!
Thanks for the bug report. You’re right, there’s a problem here.
I think that having a custom URLFetchingError is a good idea, and keeping the error management in fetch is a better idea if we want to solve this problem for images and everywhere else fetch is used (CSS, attachments…).
The problem here is that fetch returns a file object that will raise an error when read, not before being returned. It looks like the data will always be read at the end, and that there’s no real reason to delay the call to read after calling fetch. It would also simplify the code in fetch contexts.