Open-XML-SDK icon indicating copy to clipboard operation
Open-XML-SDK copied to clipboard

AddAlternativeFormatImportPart produces malformed documents if both parent and child contain shape-lines or similar objects

Open Pxtl opened this issue 4 years ago • 6 comments

Description

Document is malformed if I use AddAlternativeFormatImportPart on a document with a shape-line (or similar objects) in the root and the attached documents.

Information

  • .NET Target: .NET Framework 4.6.2
  • DocumentFormat.OpenXml Version: 2.12.3

Repro

Create 2 word documents "Template.docx" and "ExampleAttachment.docx". Insert a single shape-line in both. Note this was originally discovered using a shape-line and an attached image, but it can work with a shape-line.

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ReproAddAlternativeFormatImportPartCorruption
{
    class Program
    {
        static void Main(string[] args)
        {
            var iterations = 1;
            var outputPath = $"test_{DateTime.Now.ToString("yyyyMMdd-hhmmss")}.docx";
            var attachmentPath = "ExampleAttachment.docx";
            File.Copy("Template.docx", outputPath);

            var attachmentBytes = File.ReadAllBytes(attachmentPath);
            using (var doc = WordprocessingDocument.Open(outputPath, true))
            {
                var firstPar = doc.MainDocumentPart.Document.Body.FirstChild;

                for (int i = 0; i < iterations; i++)
                {
                    var attachmentHeading = $"{attachmentPath} {i}";
                    var xmlFileId = attachmentHeading
                        .Replace(".", "_")
                        .Replace(" ", "_");

                    var compositeElements = AddDocumentAttachment(attachmentBytes, attachmentHeading, xmlFileId, doc.MainDocumentPart, AlternativeFormatImportPartType.WordprocessingML);
                    foreach (var element in compositeElements)
                    {
                        firstPar.InsertBeforeSelf(element);
                    }
                }
                doc.Save();
            }

        }

        private static IEnumerable<OpenXmlCompositeElement> AddDocumentAttachment(
            byte[] fileData, 
            string attachmentHeading, 
            string xmlFileId, 
            MainDocumentPart mainPart, 
            AlternativeFormatImportPartType? alternativeFormatImportPartType)
        {

            // Document is automatically saved and closed onDispose.
            var chunk = mainPart.AddAlternativeFormatImportPart(alternativeFormatImportPartType.Value, xmlFileId);
            using (var chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
            using (var w = new BinaryWriter(chunkStream))
            {
                w.Write(fileData);
                w.Flush();
            }

            var altChunk = new AltChunk
            {
                Id = xmlFileId
            };

            mainPart.Document.Save();

            return new OpenXmlCompositeElement[]{
                altChunk
            };

        }
    }
}

Observed

Open the file in Word (Office 2019, version 1808, build 10372.20060), get the following error:

We're sorry. We can't open .docx because we found a problem with its contents [OK] [Details]

Details:

HRESULT 0x800004005

Location: Part: /word/document.xml, Line: 0, Column: 0

Expected

Create a document that does not cause errors.

Additional notes

Initially I discovered this issue happening inconsistently - like, I would attach 4 documents and it would be okay, but 5 was too much. And the only attached objects within the document were images.

Now, I can raise it consistently with the method described above, using documents created from scratch. If needed, I can send the docx files that will reproduce this issue.

If I use the same process without a shapes-line drawn into one or both of the files, the error does not occur.

Pxtl avatar Apr 07 '21 18:04 Pxtl

@Pxtl thanks for reporting the issue. Did this happen in previous builds of the SDK? Did you try it with other versions of Office?

tomjebo avatar Apr 07 '21 20:04 tomjebo

@tomjebo

I have not tried on other versions of either Word or Open-XML-SDK. I was implementing a new feature in our software that leverages the library... sadly we didn't find out about this problem until we were user-acceptance-testing. The bug in more complex documents is so inconsistent that it slipped by all of our tests, it worked fine until we added a few more attachments that it exceeded some invisible limit.

Pxtl avatar Apr 07 '21 20:04 Pxtl

@Pxtl Thanks, I'll take a look.

tomjebo avatar Apr 07 '21 20:04 tomjebo

@Pxtl sorry for taking so long to investigate this. The problem Word is having is with the wp:docPr element in the attached document. When you pull in that attached document, Word treats is like part of the attaching package. Running through this scenario, as you prescribed, the "template" and "attachment" documents end up with the same straight line wp:docPr element which also has the same id=1 attribute. This is considered non-conformant per ISO 29500-1 20.4.2.5 docPr (Drawing Object Non-Visual Properties) which says:

id (UniqueIdentifier)
...

If multiple objects within the same document share the same id attribute value, then the document shall be considered non-conformant. [Example: Consider a DrawingML object defined as follows:<… id="10" … >

The SDK's DocumentFormat.OpenXml.Packaging.AddAlternativeFormatImportPart function doesn't inspect attachments in any way and I don't think it would be very feasible at all to do this. If you can create your template so that the wp:docPr id attribute is unlikely to collide with those of any attachments, then that would probably be the best approach. Alternatively, you could write a utility function that scans would be attachment documents and compares the id's with the attaching document to make sure there are no collisions.

Let me know if this is clear or you have more questions.

tomjebo avatar Apr 16 '21 22:04 tomjebo

@Pxtl I've added the feature request tag. If you would like to submit a PR with a solution we'd be happy to review and consider it.

tomjebo avatar Apr 20 '21 20:04 tomjebo

Before a PR, it would be great to continue the discussion as to how this should even work. Potentially, anything that adds another part could go through and explore it to see if there are collisions

twsouthwick avatar May 17 '21 18:05 twsouthwick