html2image icon indicating copy to clipboard operation
html2image copied to clipboard

Errors and performance

Open fbillardmadrieres opened this issue 3 years ago • 3 comments

Hi, hti.screenshot is working fine for us even if we have concern regarding performance.

We have error messages during screenshots

0615/164730.282582:ERROR:file_io.cc(91)] ReadExactly: expected 8, observed 0
[0615/164730.283193:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/8_/qh0jnghd1xd_99znh3z6tvgc0000gn/T/: Operation not permitted (1)
[0615/164730.329920:WARNING:headless_browser_main_parts.cc(106)] Cannot create Pref Service with no user data dir.
[0615/164730.396624:WARNING:address_sorter_posix.cc(388)] FromSockAddr failed on netmask


Does theses errors have an impact on performance. Currently if takes 1,6s to generate a frame, and we would like to generate up to 100 (25img/s for 4s)

fbillardmadrieres avatar Jun 16 '21 08:06 fbillardmadrieres

Hello, thank you for posting an issue.


Regarding the error messages: These errors / warnings are not outputted directly by html2image, but rather by Chrome/Chromium, which html2image calls to generate images. Depending on the OS I've tested html2image on, I have indeed stumbled upon such messages but didn't looked further into it, as I wasn't getting them on most OS and as it didn't seemed to prevent image generation.

If you mind sharing the OS + OS version + chrome version of the machine you are running html2image on, I can check if I have the same error messages and if it seems to impact performances.


Regarding performances, they are greatly affected by what you are screenshotting (and of course by the specs of the machine). For example, screenshotting a file (or a string using the html_str parameter) that only contains text and that is located on your machine is very fast compared to screenshotting an URL, because you add the page load on top of the time taken by the file generation. The size at which you are screenshotting also affects a little the time to take a screenshot.

Overall, there is not much room for optimization, because we are relying on Chrome for the image generation, but if you are generating a large amount of pictures, you may want to use some form of parallel processing. Parallel processing is not implemented in html2image (yet?), but you can add it to your own scripts if you wish to.

Here is an example using a ThreadPoolExecutor from concurrent.futures :

import time
import concurrent.futures

from html2image import Html2Image


HTML = """<h1> An interesting title </h1>"""
CSS = "body {background: red;}"
SCREENSHOT_COUNT = 100

hti = Html2Image(output_path='hti_parallel')

# Some class I have on my machine, don't know the source anymore
class Timer(object):
    ''' Timing Context Manager
    '''
    def __enter__(self):
        self.start = time.perf_counter_ns()
        return self

    def __exit__(self, *args):
        end = time.perf_counter_ns()
        self.duration = (end - self.start) * 10**-9  # 1 nano-sec = 10^-9 sec

def test_classic():
    results = [
        hti.screenshot(
            html_str=HTML,
            # url='https://www.python.org',
            css_str=CSS,
            save_as=f'classic_{i}.png',
        )
        for i in range(SCREENSHOT_COUNT)
    ]

def test_TPE():
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        results = [
            executor.submit(
                hti.screenshot,
                html_str=HTML,
                # url='https://www.python.org',
                css_str=CSS,
                save_as=f'TPE_{i}.png',
            )
            for i in range(SCREENSHOT_COUNT)
        ]

if __name__ == '__main__':
    print(f'Generating {SCREENSHOT_COUNT} images...')

    with Timer() as t:
        test_classic()
    print(f'Classic generation took:\t {t.duration:.3f} \tseconds.')

    with Timer() as t:
        test_TPE()
    print(f'Generation with ThreadPoolExecutor took:\t {t.duration:.3f} \tseconds.')

When screenshotting a simple html string, it get the following output :

Generating 100 images...
Classic generation took:         28.232         seconds.
Generation with ThreadPoolExecutor took:         14.029         seconds.

When screenshotting an url (by uncommenting the url parameter and commenting html_str + css_str), it get the following output:

Generating 100 images...
Classic generation took:         68.978         seconds.
Generation with ThreadPoolExecutor took:         24.627         seconds.

In both cases, a significant performance improvement can be seen by using a ThreadPoolExecutor.

vgalin avatar Jun 16 '21 18:06 vgalin

Hi, thanks a lot for your detailed answer. I've done my test on OS X running on a MacBook pro. I'm screenshotting html code containing web animations in js. Each image generated is a step in the animation.

I've redone the test using your sample code and the TPE took 28,5s for 100 images which is much better than the standard version.

I still have these error messages: [0617/095440.939724:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/8_/qh0jnghd1xd_99znh3z6tvgc0000gn/T/: Operation not permitted (1)

[0617/095441.400263:WARNING:ipc_message_attachment_set.cc(49)] MessageAttachmentSet destroyed with unconsumed attachments: 0/1

[0617/095441.503360:ERROR:command_buffer_proxy_impl.cc(123)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer.

[0617/095440.775369:WARNING:headless_browser_main_parts.cc(106)] Cannot create Pref Service with no user data dir.

fbillardmadrieres avatar Jun 17 '21 08:06 fbillardmadrieres

Hello again. I recently ran html2image on a few different docker images based off on different OS and sometimes encountered messages like these. Using specific Chrome flags should theoretically make some of these warning disappear and solve the errors, but most of the time it didn't changed anything. I can't really do much about it as it is an "issue" directly related to Chrome / Chromium, the best would probably be to open a ticket bugs.chromium.org/ (or at least search for already existing ones).

vgalin avatar Jul 06 '21 18:07 vgalin