Open-XML-SDK icon indicating copy to clipboard operation
Open-XML-SDK copied to clipboard

Memory problems when modifying slide containing lots of elements

Open sorensenmatias opened this issue 3 years ago • 2 comments

Description

I have a presentation with a lot of vector graphics on the slides. I want to inspect the SlidePart of each Slide using the Descendants method and modify some elements. This loads the entire DOM of the slide in memory, which ends spending gigabytes of memory. Memory profiling shows that each slide allocates something along the lines of 200000 objects in memory.

It seems that there is no way to avoid loading the entire DOM to memory? I have experimented with using the OpenXmlReader to read the desired elements from the SlidePart, modify them accordingly, and then write them back using the OpenXmlWriter. This avoid loading the DOM of all elements, but it is hard to get it right because the elements returned by the OpenXmlReader are different instances and are therefore detached from the SlidePart and Slide. This makes subsequent modifications of elements of the SlidePart difficult.

I think it is needed to have a way to easily modify elements without exhausting a lot of memory.

Information

  • .NET Target: .NET Core 3.1
  • DocumentFormat.OpenXml Version: 2.11.3

Repro

Create a presentation containing a slide with a lot of shapes with vector graphics and access the Slide property of the SlidePart.

<p:sp> <p:nvSpPr> <p:cNvPr id="206" name="Freeform 11" /> <p:cNvSpPr> <a:spLocks /> </p:cNvSpPr> <p:nvPr /> </p:nvSpPr> <p:spPr bwMode="auto"> <a:xfrm> <a:off x="7046" y="2799" /> <a:ext cx="6" cy="21" /> </a:xfrm> <a:custGeom> <a:avLst /> <a:gdLst> <a:gd name="T0" fmla="*/ 3 w 3" /> <a:gd name="T1" fmla="*/ 5 h 12" /> <a:gd name="T2" fmla="*/ 1 w 3" /> <a:gd name="T3" fmla="*/ 11 h 12" /> <a:gd name="T4" fmla="*/ 1 w 3" /> <a:gd name="T5" fmla="*/ 7 h 12" /> <a:gd name="T6" fmla="*/ 3 w 3" /> <a:gd name="T7" fmla="*/ 1 h 12" /> <a:gd name="T8" fmla="*/ 3 w 3" /> <a:gd name="T9" fmla="*/ 5 h 12" /> </a:gdLst> <a:ahLst /> <a:cxnLst> <a:cxn ang="0"> <a:pos x="T0" y="T1" /> </a:cxn> <a:cxn ang="0"> <a:pos x="T2" y="T3" /> </a:cxn> <a:cxn ang="0"> <a:pos x="T4" y="T5" /> </a:cxn> <a:cxn ang="0"> <a:pos x="T6" y="T7" /> </a:cxn> <a:cxn ang="0"> <a:pos x="T8" y="T9" /> </a:cxn> </a:cxnLst> <a:rect l="0" t="0" r="r" b="b" /> <a:pathLst> <a:path w="3" h="12"> <a:moveTo> <a:pt x="3" y="5" /> </a:moveTo> <a:cubicBezTo> <a:pt x="2" y="8" /> <a:pt x="2" y="11" /> <a:pt x="1" y="11" /> </a:cubicBezTo> <a:cubicBezTo> <a:pt x="0" y="12" /> <a:pt x="0" y="10" /> <a:pt x="1" y="7" /> </a:cubicBezTo> <a:cubicBezTo> <a:pt x="1" y="4" /> <a:pt x="2" y="1" /> <a:pt x="3" y="1" /> </a:cubicBezTo> <a:cubicBezTo> <a:pt x="3" y="0" /> <a:pt x="3" y="2" /> <a:pt x="3" y="5" /> </a:cubicBezTo> <a:close /> </a:path> </a:pathLst> </a:custGeom> <a:grpFill /> <a:ln> <a:noFill /> </a:ln> <a:extLst> <a:ext uri="{91240B29-F687-4F45-9708-019B960494DF}"> <a14:hiddenLine xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main" w="9525"> <a:solidFill> <a:srgbClr val="000000" /> </a:solidFill> <a:round /> <a:headEnd /> <a:tailEnd /> </a14:hiddenLine> </a:ext> </a:extLst> </p:spPr> <p:txBody> <a:bodyPr vert="horz" wrap="square" lIns="91440" tIns="0" rIns="72000" bIns="0" numCol="1" anchor="t" anchorCtr="0" compatLnSpc="1"> <a:prstTxWarp prst="textNoShape"> <a:avLst /> </a:prstTxWarp> </a:bodyPr> <a:lstStyle /> <a:p> <a:endParaRPr lang="en-GB"> <a:solidFill> <a:schemeClr val="tx2" /> </a:solidFill> </a:endParaRPr> </a:p> </p:txBody> </p:sp>

Observed

Memory exhaustion occours.

Expected

Some way to modify big presentations without loading everything to memory.

sorensenmatias avatar Oct 09 '20 08:10 sorensenmatias

Hi @sorensenmatias - this may be related to #807, but can you include an example of the code you are doing to do this?

twsouthwick avatar Oct 13 '20 22:10 twsouthwick

Hi, I have also tried to modify a large worksheet(write the Columns OpenXML element after writing the sheetData using a method like InsertBefore) and there is a lot of memory consumption the process will never complete. Se related SO question.

Version: 2.5.0(but occurs using 2.11 as well) Framework: .NET 4.6.1

amoraitis avatar Oct 20 '20 07:10 amoraitis