html-to-docx icon indicating copy to clipboard operation
html-to-docx copied to clipboard

Images don't render when inside certain tags

Open brockfanning opened this issue 4 years ago • 12 comments

Hello, I'm very excited to find this project and see that it gets around the altchunks problem. Should this library support the output of base64 images that are in the HTML? I gave it a quick test and it did not appear to output any images. I just wanted to make sure I'm not doing something wrong. If we do not support images yet here, consider this a feature request and/or offer of help (if you could point me in the right direction to the relevant code).

brockfanning avatar Dec 17 '20 19:12 brockfanning

After a bit more experimentation I think it may be a bug caused by the image being wrapped in a <p> tag. I'll update the issue title.

brockfanning avatar Dec 17 '20 20:12 brockfanning

@brockfanning Could you please post a test HTML string so as to replicate the issue?

privateOmega avatar Dec 21 '20 04:12 privateOmega

@privateOmega I tested it by adding <p> tags in the node example, like so: https://github.com/brockfanning/html-to-docx/commit/e40026a9db5b089e62359771a788c0dba9566483

brockfanning avatar Dec 21 '20 04:12 brockfanning

I believe this issue is not limited to p tags. Images also do not render when nested at some level inside most tags other than div, th, or td. In my limited testing this includes:

  • span
  • li
  • blockquote
  • strong
  • i
  • u

KeithGillette avatar Feb 16 '21 14:02 KeithGillette

I've spent a few hours poking around and I think the issue is that the xml-builder only checks for buildImage when processing select elements like divs and tables but not when building paragraphs of inline elements or blockquotes. Unfortunately, I can't find the right place(s) to insert additional calls to buildImage to correct the problem. My attempts lead to call stack overflows or only limited additional tags actually working. Any pointers, @privateOmega ?

KeithGillette avatar Feb 17 '21 23:02 KeithGillette

Hey guys,

Any progress in this regard ?

TheDarkStrix avatar Aug 10 '21 14:08 TheDarkStrix

I can reproduce the issue. In my case, the image with src set to a data url (base64) does not make it into the export even if it's wrapped in a div. Basically, no images are exported. I'm testing the export with MS Word 2013 and LibreOffice 7.

It would be awesome if this was fixed.

@privateOmega Thanks for your work on this! Not an easy feat.

tgv1975 avatar Feb 07 '22 17:02 tgv1975

I can reproduce the issue. In my case, the image with src set to a data url (base64) does not make it into the export even if it's wrapped in a div. Basically, no images are exported. I'm testing the export with MS Word 2013 and LibreOffice 7.

It would be awesome if this was fixed.

@privateOmega Thanks for your work on this! Not an easy feat.

Thanks. Could you please post a sample html code containing base64 image you were trying out for me to test, I have tried a png and a jpeg images in base64 and both are getting rendered?

privateOmega avatar Mar 10 '22 19:03 privateOmega

Hi, i have the same issue. Here is the non-working example: <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> </head> <body> <p>Test</p> <p></p> <p> <div> <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="> </div> </p> <p></p> <p>Test</p> <p></p> <p> <div> <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="> </div> </p> <p></p> <p></p> </body> </html>

Just remove outer p tag around div, and it will work, like in the following example: <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> </head> <body> <p>Test</p> <p></p> <div> <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="> </div> <p></p> <p>Test</p> <p></p> <div> <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="> </div> <p></p> <p></p> </body> </html>

zeljko-bulatovic avatar Mar 13 '22 09:03 zeljko-bulatovic

Thanks. Could you please post a sample html code containing base64 image you were trying out for me to test, I have tried a png and a jpeg images in base64 and both are getting rendered?

First, sorry for the late reply. Here is how I'm doing it, and the image doesn't make it into the .docx:

Minimal test markup (using the image above, used by @zeljko-bulatovic):

<h1>This is the title</h1><p>This is some text.</p><img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="><p>Some more text.</p>

This is how the HTML rendering looks: image

Here's the .docx output, in Word (image missing): image

Attached, the .docx itself, for investigation:

test.docx

Looking into the .docx source itself, I can't find any reference to the image. I'm stumped. Hopefully you'll be able to debug this.

Thank you!

PS: worth noting that I'm converting in the browser, not in a Node.js backend.

tgv1975 avatar May 25 '22 16:05 tgv1975

I have same problem. I need help!!!

xiuluo211314 avatar Aug 12 '22 15:08 xiuluo211314