docx icon indicating copy to clipboard operation
docx copied to clipboard

Allow access to other XML docs in docx file like the header and footer

Open yjukaku opened this issue 5 years ago • 9 comments

This adds support for retrieving all of the header and footer documents embedded in the docx file, as well as the numbering docs.

This is based on the work in #22 and #42.

It also closes #49 and #32

yjukaku avatar Oct 16 '19 19:10 yjukaku

@chrahunt we need this solution from @yjukaku

fercreek avatar Nov 24 '19 22:11 fercreek

👋 Is there anything holding up this PR from merging? Anything we can do to help?

yjukaku avatar Oct 08 '20 16:10 yjukaku

This PR would solve a problem I am currently encoutering (namely: setting a bookmark in a header). I am willing to help to get this PR merged, what is holding this back?

nathanvda avatar Jun 28 '21 17:06 nathanvda

There is a conflict file now.

@yjukaku do you have time to resolve the conflict?

satoryu avatar Jun 29 '21 08:06 satoryu

So I was trying if I could get it working, I see the main difference now is that for Office365 files we have to either try document.xml and if that does not exist, use document2.xml.

So I created a local version, where more inline with the current code, instead of iterating over DOCUMENT_PATHS I added explicit methods to load_headers and footers and numbering, as we already have a load_styles too.

But when trying to adapt the update method accordingly, I noticed we only update the word/document.xml regardless of the source (leaving the document2.xml as is?) and I am not sure if that is ok/a problem? Can I ignore that for now?

nathanvda avatar Jun 29 '21 15:06 nathanvda

I added explicit methods to load_headers and footers and numbering, as we already have a load_styles too.

I was trying to DRY the code with the DOCUMENT_PATHS hash, but if that's not needed 🤷‍♂️ .

Can I ignore that for now?

I personally would expect the document file name to be the same as the original when updated. It appears the better way to find the proper document name would be to check the file [Content Types].xml in the zip, then look for an Override tag in that XML file that has a ContentType attribute with the value application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml. That will tell us exactly which file is the "main" one, and a similar method can be used for the headers, footers, numbering, styles, etc.

See http://officeopenxml.com/anatomyofOOXML.php under Content Types

Here's a sample [Content Types].xml:

<?xml version="1.0" encoding="UTF-8"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
    <Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/word/_rels/document.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/>
  <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/>
  <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
  <Override PartName="/word/numbering.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml"/>
  <Override PartName="/word/footer1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/>
  <Override PartName="/word/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
  <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>
  <Override PartName="/customXml/_rels/item1.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/customXml/itemProps1.xml" ContentType="application/vnd.openxmlformats-officedocument.customXmlProperties+xml"/>
  <Override PartName="/customXml/item1.xml" ContentType="application/xml"/>
  <Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/>
  <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
  <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
</Types>

yjukaku avatar Jun 29 '21 16:06 yjukaku

Can we merge this PR as well? I need access to numbering and header/footer. Thanks.

aunghtain avatar Mar 12 '22 17:03 aunghtain

Thanks, the proposed change seems good at a high level to me. (I'm not affiliated with the project, just someone who has started using the library.) This would be helpful for one case I saw today where the important text information we wanted was in the document footer. Right now that information is inaccessible.

I wouldn't want to delay this PR, but what do you think about adding the header or footer contents to methods like .text on documents? Maybe it could take the contents of any headers and put that at the top of the document text, and the contents of the footers at the end. That way document.text would truly give you all of the text of the document.

panozzaj avatar Mar 25 '22 22:03 panozzaj

Any update on this? I've been waiting for it for more than a year now.

aunghtain avatar Nov 14 '23 19:11 aunghtain