html2image
html2image copied to clipboard
There is no obvious way to use a local image / how to do so is not documented.
I have a very small project on a Windows 10 machine with the following image in the file example.png:
I try to make a screenshot of a simple html project using the following code:
from html2image import Html2Image
hti = Html2Image()
html_str ="""
<!DOCTYPE html>
<html>
<body>
<img src="example.jpg">
</body>
</html>
"""
css_str = """
body {
margin: 0;
background: black;
}
"""
hti.screenshot(html_str=html_str, css_str=css_str, save_as="test.png")
I would expect to see the image in the result. But it only results in the following:
If I copy the html and css strings to a file (and link them) I can see the expected result with my browser. So I guess it has nothing to do with reading access or so.
Hello, this issue is due to how html2image handles files.
Context
When you pass the html_str
and css_str
parameters to the screenshot
method, their content is respectively written to an html and a css file inside a directory located in your temp folder (%temp%
on Windows). The html file is then opened with Chrome/Chromium in headless mode to take a screenshot. Because the css file is placed in the same directory as the html one, the browser is able to properly find the stylesheet.
In our case, the browser can't find the image as its path leads to nowhere (there is no example.jpg
file in the temp directory used by html2image).
Solution
There are four ways to solve this issue.
1 & 2. Use absolute paths or use an URL
The first one would be to use absolute paths when you refer to resouces that are located on your machine, so that they can always be found.
So instead of example.jpg
you could use C:\the\current\path\to\example.jpg
There is also the possibility to host the image somewhere and use its URL directly.
3. Load the images using the load_file
method
The second one is to manually "load" the image(s) you are using in the directory used by html2image, behind the scene this method will create a copy of the file and place it in the temp directory used by html2image.
from html2image import Html2Image
hti = Html2Image()
hti.load_file('example.jpg')
...
4. Change the temp directory used by html2image
The third one is more of a 'hack', but can be useful when you don't want to manually load a resource (when you have multiple images or when you don't necessarily know their name), it consists of setting the temporary directory used by html2image to the current directory. The html file(s) will be generated in the current directory, where the image already is.
from html2image import Html2Image
hti = Html2Image(temp_path='./')
...
That should help you to solve this issue, tell me if you still encounter problems. Please do not close the issue as I will keep it as a reminder to document this behaviour (you can remove yourself from the "participants" list if you do not wish to receive updates).
TODO
- Using images like this in HTML files is quite trivial, and this behaviour should have been documented.
- Potentially changing the
load_file
method name to something more explicit likecopy_file_to_tmp
. - Not entirely related : potentially a way to remove the generated HTML files afterwards.
Thanks a lot, worked like a charm.
As an alternative, you can also load png as base64 string into the HTML itself.
Example:
# replace src path with base64 encoded string
png_paths = re.findall('src="(\S+?.png)"', html)
for png_path in png_paths:
with open(png_path, 'rb') as f:
base64_png = 'data:image/png;base64, ' + base64.b64encode(f.read()).decode('ascii')
html = html.replace(png_path, base64_png)
It uses a regex expression to detect src
tags ending with .png
(and defined with double quotes) in the HTML. Then it opens each image, converts them to base64 string, and injects them back into the HTML