html-to-docx
html-to-docx copied to clipboard
Images don't render when inside certain tags
Hello, I'm very excited to find this project and see that it gets around the altchunks problem. Should this library support the output of base64 images that are in the HTML? I gave it a quick test and it did not appear to output any images. I just wanted to make sure I'm not doing something wrong. If we do not support images yet here, consider this a feature request and/or offer of help (if you could point me in the right direction to the relevant code).
After a bit more experimentation I think it may be a bug caused by the image being wrapped in a <p> tag. I'll update the issue title.
@brockfanning Could you please post a test HTML string so as to replicate the issue?
@privateOmega I tested it by adding <p> tags in the node example, like so: https://github.com/brockfanning/html-to-docx/commit/e40026a9db5b089e62359771a788c0dba9566483
I believe this issue is not limited to p tags. Images also do not render when nested at some level inside most tags other than div, th, or td. In my limited testing this includes:
- span
- li
- blockquote
- strong
- i
- u
I've spent a few hours poking around and I think the issue is that the xml-builder only checks for buildImage when processing select elements like divs and tables but not when building paragraphs of inline elements or blockquotes. Unfortunately, I can't find the right place(s) to insert additional calls to buildImage to correct the problem. My attempts lead to call stack overflows or only limited additional tags actually working. Any pointers, @privateOmega ?
Hey guys,
Any progress in this regard ?
I can reproduce the issue. In my case, the image with src set to a data url (base64) does not make it into the export even if it's wrapped in a div. Basically, no images are exported. I'm testing the export with MS Word 2013 and LibreOffice 7.
It would be awesome if this was fixed.
@privateOmega Thanks for your work on this! Not an easy feat.
I can reproduce the issue. In my case, the image with
srcset to a data url (base64) does not make it into the export even if it's wrapped in adiv. Basically, no images are exported. I'm testing the export with MS Word 2013 and LibreOffice 7.It would be awesome if this was fixed.
@privateOmega Thanks for your work on this! Not an easy feat.
Thanks. Could you please post a sample html code containing base64 image you were trying out for me to test, I have tried a png and a jpeg images in base64 and both are getting rendered?
Hi, i have the same issue. Here is the non-working example:
<html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> </head> <body> <p>Test</p> <p></p> <p> <div> <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="> </div> </p> <p></p> <p>Test</p> <p></p> <p> <div> <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="> </div> </p> <p></p> <p></p> </body> </html>
Just remove outer p tag around div, and it will work, like in the following example:
<html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> </head> <body> <p>Test</p> <p></p> <div> <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="> </div> <p></p> <p>Test</p> <p></p> <div> <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="> </div> <p></p> <p></p> </body> </html>
Thanks. Could you please post a sample html code containing base64 image you were trying out for me to test, I have tried a png and a jpeg images in base64 and both are getting rendered?
First, sorry for the late reply. Here is how I'm doing it, and the image doesn't make it into the .docx:
Minimal test markup (using the image above, used by @zeljko-bulatovic):
<h1>This is the title</h1><p>This is some text.</p><img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="><p>Some more text.</p>
This is how the HTML rendering looks:

Here's the .docx output, in Word (image missing):

Attached, the .docx itself, for investigation:
Looking into the .docx source itself, I can't find any reference to the image. I'm stumped. Hopefully you'll be able to debug this.
Thank you!
PS: worth noting that I'm converting in the browser, not in a Node.js backend.
I have same problem. I need help!!!