docx-rs icon indicating copy to clipboard operation
docx-rs copied to clipboard

*Empty* document has 4 OOXML Validator errors

Open qtfkwk opened this issue 1 year ago • 5 comments

Describe the bug

I've been using this library to generate docx files that work fine in LibreOffice, but was surprised that Microsoft Word reported they needed to be "recovered" which subsequently failed and said they were "corrupt."

Since these files used many features (paragraph and character styles, images, tables, ...) which produced hundreds of issues when using the OOXML Validator VSCode extension (see also: mikeebowen/ooxml-validator-vscode), I tried it against an empty file (as in just creating a docx via this library, adding nothing, and then building and saving it) to see if I was "doing something wrong" (?).

I found that this file had the following 4 errors (so it seemed appropriate to start with those).

Reproduced step

Steps to reproduce the behavior:

cargo new docx-test
cd docx-test
cargo add anyhow docx-rs
cat <<EOF >src/main.rs
use anyhow::Result;
use docx_rs::*;

fn main() -> Result<()> {
    let docx = Docx::new();
    docx.build().pack(std::fs::File::create("test.docx")?)?;
    Ok(())
}
EOF
cargo run
  1. Open VS Code.
  2. Install the OOXML Validator VSCode extension.
  3. Open the docx-test folder.
  4. Right-click the generated test.docx, select Validate OOXML, wait for validation to complete, click the View Errors button.

For reference, I've attached the test.docx.

Expected behavior

Any generated docx file (whether empty or using any/all features) should open without issue in Microsoft Word and pass a validator.

Actual behavior

The generated test.docx opens in Word without issue, but the OOXML Validator VSCode extension produced 4 errors:

[
  {
    "Id": "Sch_UnexpectedElementContentExpectingComplex",
    "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:pPr'.",
    "Namespaces": {},
    "XPath": "/w:styles[1]/w:style[1]",
    "PartUri": "/word/styles.xml",
    "ErrorType": "Schema"
  },
  {
    "Id": "Sch_InvalidElementContentExpectingComplex",
    "Description": "The element has invalid child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:rPr'. List of possible elements expected: <http://schemas.openxmlformats.org/wordprocessingml/2006/main:keepNext>.",
    "Namespaces": {},
    "XPath": "/w:styles[1]/w:style[1]/w:pPr[1]",
    "PartUri": "/word/styles.xml",
    "ErrorType": "Schema"
  },
  {
    "Id": "Sch_InvalidElementContentExpectingComplex",
    "Description": "The element has invalid child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:rPr'. List of possible elements expected: <http://schemas.openxmlformats.org/wordprocessingml/2006/main:keepNext>.",
    "Namespaces": {},
    "XPath": "/w:styles[1]/w:docDefaults[1]/w:pPrDefault[1]/w:pPr[1]",
    "PartUri": "/word/styles.xml",
    "ErrorType": "Schema"
  },
  {
    "Id": "Sch_UnexpectedElementContentExpectingComplex",
    "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:zoom'.",
    "Namespaces": {},
    "XPath": "/w:settings[1]",
    "PartUri": "/word/settings.xml",
    "ErrorType": "Schema"
  }
]

Desktop (please complete the following information)

  • OS: Debian Linux

qtfkwk avatar Jun 21 '24 12:06 qtfkwk

Created a repository to track my testing: https://github.com/qtfkwk/docx-test

Please feel free to use it.

qtfkwk avatar Jun 21 '24 19:06 qtfkwk

@qtfkwk Thanks!!!!!

bokuweb avatar Jun 22 '24 00:06 bokuweb

For what it's worth, I can open docx-rs generated documents in word without them being seen as corrupted. However, I have seen instances where I had to play with the various part of the library to make sure things were properly declared everywhere - thinking instances of style. numbering or attachments which may need to be adequately "linked/declared". On this front libre office seems a tad more permissive - even if its may end up not rendering things correctly, maybe due to these missing elements.

git-noise avatar Jun 23 '24 13:06 git-noise

I'll do some investigation.

bokuweb avatar Jun 23 '24 23:06 bokuweb

i can contribute to fix this. it is elements order to be followed as per ooxml.

Update: I have fixed the issue and raised PR https://github.com/bokuweb/docx-rs/pull/735

ImplFerris avatar Jul 03 '24 04:07 ImplFerris