python-docx-template icon indicating copy to clipboard operation
python-docx-template copied to clipboard

Image randomly disappear when I put image in the header and multiple images in the body

Open cw0516 opened this issue 3 years ago • 34 comments

Describe the bug

When I put static image or Inline variable image in the header and then put multiple images in the body, an image of the list of multiple images disappear randomly

To Reproduce

here is my github where runnable test case exist

https://github.com/cw0516/docxtplBug

and here is same python code in my githhub

//////////////////////////////////////////////////////////////////// from docxtpl import DocxTemplate, InlineImage from docx.shared import Mm

tpl = DocxTemplate('templates/header_footer_inline_image_tpl.docx')

context = { 'inline_image': InlineImage(tpl, 'templates/django.png', height=Mm(10)), 'images': [ InlineImage(tpl, 'templates/django.png', height=Mm(10)), InlineImage(tpl, 'templates/django.png', height=Mm(10)), InlineImage(tpl, 'templates/django.png', height=Mm(10)) ] } tpl.render(context) tpl.save('output/header_footer_inline_image.docx') ///////////////////////////////////////////////////////////////////

here is same template docx and image in my github

django header_footer_inline_image_tpl.docx

Expected behavior

An image in the header and images in the body should be printed normally

Screenshots

screenshot

cw0516 avatar Jul 22 '21 03:07 cw0516

Cannot reproduce : Tested on Office365 online : generated docx is displayed correctly, even by updating, saving + re-opening. Tried many times, tried with more inline images in body too. Was running docxtpl 0.11.5 and python 3.9

elapouya avatar Jul 29 '21 19:07 elapouya

Thank you for your help🙂How many images printed?? In my python code, 3 images should be printed in the body but it shows only 2 in the body

@srutherfordbta how about you??

cw0516 avatar Jul 29 '21 23:07 cw0516

All images are printed : 1 in the Header, 3 in the body. I added up to 7 images in the body and it always worked as expected.

elapouya avatar Jul 30 '21 07:07 elapouya

@cw0516 I executed your code on my laptop got the same results with you. Only two of the three images showing in the body. Word - Version 2106 (Build 14131.20332 Click-to-Run) Windows Windows 10 Pro (Version 10.0.19041 Build 10941) Python 3.7.9 64-bit docxtpl 0.11.5

srutherfordbta avatar Jul 30 '21 13:07 srutherfordbta

I uploaded the output file to office 365 and viewed the document. It rendered successfully. I think what we have is a rendering bug in the Windows version of MS-Word. @cw0516 can you provide your Windows and Word builds for the record? image

srutherfordbta avatar Jul 30 '21 14:07 srutherfordbta

I forced an MS Office update and Word updated to Version 2107 (Build 14228.20204 Click-to-Run). The rendering issue is still there.

srutherfordbta avatar Jul 30 '21 14:07 srutherfordbta

yeah my word version on this issue was 2107(build 14228.20204) and my os is window 10 pro also when I uploaded my rendered docx file to Online Office, it worked pretty fine

cw0516 avatar Jul 30 '21 14:07 cw0516

@elapouya I opened the output in Office365 which renders correctly and added two spaces below the third django image in the body. I then saved it to my local laptop. I opened it in Windows Word and it is just fine. Would you mind comparing the internals of this document to the one on @cw0516 git repo output folder to see what the difference is short of the two spaces? I hope it is something actionable so we can workaround the rendering issue in Windows Word when using docxtpl to create documents.

header_footer_inline_image.docx

srutherfordbta avatar Jul 30 '21 14:07 srutherfordbta

Made a diff between before and after modifying on Office365 : there are some differences, but I cannot see something making images hidden randomly.

snapshot_docx_office365

The problem is that this xml code is generated with python-docx and I cannot see what to do with the problem.

elapouya avatar Jul 30 '21 17:07 elapouya

@elapouya Would you mind creating an issue on the python-docx git explaining what your code does and what you are seeing? Maybe they have some ideas or hopefully a fix. https://github.com/python-openxml/python-docx/issues

srutherfordbta avatar Jul 30 '21 17:07 srutherfordbta

That is not that easy : InlineImage has been modified by many people on this project, using internals of python-docx that I do not know. The first thing I have to do is to have a test case. That means I can reproduce the problem (which is not my case) and isolate the faulty xml code. Right now I do not own MS Word at your version, I have to buy one ...

elapouya avatar Jul 30 '21 19:07 elapouya

You can get an office 365 account at $7 a month which includes Word for both PC and Mac. If you got a paypal account or whatever I would be happy to help out. You have done such a great job with docx-tpl and remained active with new features and maintenance.

srutherfordbta avatar Jul 30 '21 19:07 srutherfordbta

Thank you a lot for your proposal :) But I can buy one myself. Unfortunately my internet access has actually some troubles, I cannot download big files. I expect to have a solution within 2 weeks.

elapouya avatar Jul 30 '21 19:07 elapouya

While coding another feature, I discovered that templates themselves could be "corrupted" in some way or at least not understood totally by python-docx. Mines was created a long time ago with MS Word 1997. Could you do 2 tests for me : With https://github.com/cw0516/docxtplBug test case, could you :

  1. Try to "sanitize" templates/header_footer_inline_image_tpl.docx by reading it by office365, saving it again with the same name and run the test again.
  2. Instead of giving 3 times the same image, could you try with 3 different images ?

elapouya avatar Jul 31 '21 10:07 elapouya

https://user-images.githubusercontent.com/48001039/127737839-40001d25-e6fb-4e35-aa4e-c11a9a7a44b7.mp4

I tried first test. Is this right?

cw0516 avatar Jul 31 '21 11:07 cw0516

Yes, thank you, it was what I wanted : obviously, the problem remains. Could you try 2nd test ?

elapouya avatar Jul 31 '21 11:07 elapouya

sure here is code , images and result

image

cw0516 avatar Jul 31 '21 14:07 cw0516

Thank you a lot, this is really incredible. Could you swap line 11 and 12 to see whether it is always the first image ?

elapouya avatar Jul 31 '21 14:07 elapouya

yeah It's hard for me to know what happen internally ..

image

cw0516 avatar Jul 31 '21 14:07 cw0516

With what you've given to me, I think it is a matter of .docx "relations ID" : one is missing or there is a shift in IDs My little finger tells me that if you remove totally the header, the 3 images will be displayed. I will look again the internal of the .docx in this direction...

elapouya avatar Jul 31 '21 15:07 elapouya

yeah In my project, There was no problem of printing images in body but after I inserted image in the header, this issue happend. Anyway thank you for your work and I will keep watching this issue as a lover of docxtpl😄

cw0516 avatar Jul 31 '21 15:07 cw0516

I compared docx internals between your generated docx and the same but modified on office 365 online : I cannot see obvious differences : all internal .xml files are valid (tested in a XML checker). Relationships ID has been re-numbered but still pointing on the same resources. The things I noticed : 'eastAsia' related tags has been removed with office 365 online, xmlns attributes are not always on the same tag. XML syntax do not differ between image1, image2 and image3 in the body, so if first is not seen the others should not be seen too which is not the case.

I really need the MS Word version you are using to reproduce my side and do deeper testings, I will be able to get that version within 2 weeks I think...

elapouya avatar Jul 31 '21 16:07 elapouya

Hi all,

I am currently experiencing the same issue described here. We are using python-docx to generate a DOCX file and python-docx-template to overwrite template tags with some specific content (part of it coming from the user). The problem randomly appeared around 3 weeks ago and consist in an image created using matplotlib.pyplot.grid. The image is successfully saved and it also appears in the DOCX file if no other images are used in the body of the docx. I did some research about this and I will let here the findings, might be helpful:

First, the problem is not present on all versions of Office. I will underline below the version I tried on (❌ for doesn't work, ✔️ for it works)

  • For Office 2019 version MSO (16.0.14228.20200) 64-bit❌
  • For Office 2019 version MSO (16.0.14228.20216) 64-bit❌
  • For Office 2019 version MSO (16.0.11901.20170) 64-bit✔️
  • For 365 MSO (16.0.14228.20200) 64-bit (installed on different versions of Windows 10) ❌

Also, exploring the XML files I noticed that even if the image doesn't appear in Microsoft Office, it exists in the XML structure. I can provide a snippet of the code below. I am available for any other tests/information as this is really bugging us and our clients. Thanks!

<w:p>
    <w:r>
        <w:t xml:space="preserve"/>
    </w:r>
    <w:r>
        <w:drawing>
            <wp:inline
                xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
                xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
                <wp:extent cx="5687568" cy="3986784"/>
                <wp:docPr id="2" name="Picture 2"/>
                <wp:cNvGraphicFramePr>
                    <a:graphicFrameLocks noChangeAspect="1"/>
                </wp:cNvGraphicFramePr>
                <a:graphic>
                    <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
                        <pic:pic>
                            <pic:nvPicPr>
                                <pic:cNvPr id="0" name="hcopytflkd.png"/>
                                <pic:cNvPicPr/>
                            </pic:nvPicPr>
                            <pic:blipFill>
                                <a:blip r:embed="rId18"/>
                                <a:stretch>
                                    <a:fillRect/>
                                </a:stretch>
                            </pic:blipFill>
                            <pic:spPr>
                                <a:xfrm>
                                    <a:off x="0" y="0"/>
                                    <a:ext cx="5687568" cy="3986784"/>
                                </a:xfrm>
                                <a:prstGeom prst="rect"/>
                            </pic:spPr>
                        </pic:pic>
                    </a:graphicData>
                </a:graphic>
            </wp:inline>
        </w:drawing>
    </w:r>
    <w:r>
        <w:t xml:space="preserve"/>
    </w:r>
</w:p>

dragosnsandu avatar Aug 09 '21 14:08 dragosnsandu

@dragosnsandu Thanks for nailing down our suspicions around a new version of Windows Word being the catalyst for the break. Do you have images in the header and/or footer as well? For the record, I am generating my images with graphviz as png files, but I don't think the source of the image is a concern at this point.

srutherfordbta avatar Aug 10 '21 14:08 srutherfordbta

@srutherfordbta Yes, I confirm that we also have images in the header section of the document. The image from the header is not programatically added, but it is included via the DOCX template I am using.

Another detail, the issues is not related to what version of the docxtpl package we are using (as the same behavior applies for both version 0.8.1 and also the most recent one, 0.11.5)

dragosnsandu avatar Aug 10 '21 14:08 dragosnsandu

That makes sense on the docxtpl version. The answer to "What has changed?" is the Windows version of Word. It is hard to say if it is a bug in the Windows Word version or if it is something intentionally deprecated/dropped, with Office 365 and the Mac versions getting the same code update down the road. I don't think any of us have insight to that. My fear at this point is that Office 365 version gets updated with the same code and we lose the workaround of opening the document in Office 365, making a small change, saving, downloading and opening correctly in Windows Word. I am really hoping Eric can do some wizardry to get docxtpl to generate the same structure that Office 365 creates once he gets his version of Windows Word.

srutherfordbta avatar Aug 10 '21 14:08 srutherfordbta

I just had a look at the docx repo issues and it looks like they are seeing the same thing in the issue below. Let's keep an eye on their issue to see if they come to any kind of resolution. https://github.com/python-openxml/python-docx/issues/981#issuecomment-891395503

srutherfordbta avatar Aug 13 '21 15:08 srutherfordbta

I finally succeeded to reproduce the problem with MS Word Version 2107 on Windows 10. Now I can investigate...

elapouya avatar Aug 14 '21 13:08 elapouya

After building 30+ docx by hand with zip, I found the reason : There is a collision with an ID between the header images and the body :

<wp:docPr id="1" name="Picture 1"/>

Is the same for header picture and first picture in the body. If I put manually a random ID number for the image in header, all images are now displayed in the body.

I have no idea yet how python-docx manage this ID : I investigate...

elapouya avatar Aug 14 '21 15:08 elapouya

Please could you try docxtpl 0.12.0 I just released : it should solve the problem.

elapouya avatar Aug 15 '21 13:08 elapouya