Open-XML-SDK icon indicating copy to clipboard operation
Open-XML-SDK copied to clipboard

SAX Approach Replace Example

Open AlexHedley opened this issue 5 years ago • 9 comments

I'm looking for an example of using the SAX approach (instead of DOM) to open a large file and perform a replace on a given value, then save the file.

I think this would be a useful addition to the Documentation.

How to: Search and replace text in a document part (Open XML SDK) https://docs.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part

This uses a Stream.

How to: Parse and read a large spreadsheet document (Open XML SDK) https://docs.microsoft.com/en-us/office/open-xml/how-to-parse-and-read-a-large-spreadsheet

// The SAX approach.
static void ReadExcelFileSAX(string fileName)
{
    using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
    {
        WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

        OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
        string text;
        while (reader.Read())
        {
            if (reader.ElementType == typeof(CellValue))
            {
                text = reader.GetText();
                Console.Write(text + " ");
            }
        }
        Console.WriteLine();
        Console.ReadKey();
    }
}

If I read in the "text" and wish to replace this:

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filePath, true))
{
    Document document = wordDoc.MainDocumentPart.Document;
    OpenXmlReader reader = OpenXmlReader.Create(document);
    while (reader.Read())
    {
        //OpenXmlElement element = reader.LoadCurrentElement();
        //text = element.InnerText;
        text = reader.GetText();
    }
}

OpenXmlWriter expects an OpenXmlPart or Stream when you create it.

OpenXmlWriter writer = OpenXmlWriter.Create(#);
writer.WriteStartElement(reader);
writer.WriteElement(#);
writer.WriteEndElement();
writer.Close();

What is the supported approach/method for this?

Using the DOM approach on large files can cause memory exceptions.


There are a number of blog posts documenting how to use the OpenXmlWriter but this is for creating new files or adding new elements to an existing file, not updating existing data.

Parsing and Reading Large Excel Files with the Open XML SDK http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx [Dead Link] https://web.archive.org/web/20151205145806/http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx

Writing Large Excel Files with the Open XML SDK http://blogs.msdn.com/b/brian_jones/archive/2010/06/22/writing-large-excel-files-with-the-open-xml-sdk.aspx [Dead Link] https://web.archive.org/web/20160216062257/http://blogs.msdn.com/b/brian_jones/archive/2010/06/22/writing-large-excel-files-with-the-open-xml-sdk.aspx

Performance issue while reading/writing large excel files using OpenXML SDK http://tech-turf.blogspot.com/2015/10/performance-issue-while-readingwriting.html

How to read and write Excel cells with OpenXML and C# http://fczaja.blogspot.com/2013/05/how-to-read-and-write-excel-cells-with.html

How to properly use OpenXmlWriter to write large Excel files http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/

AlexHedley avatar Mar 06 '19 11:03 AlexHedley

Stale issue message

github-actions[bot] avatar May 14 '20 00:05 github-actions[bot]

Hi @twsouthwick I’ve see you comment on other issues for this repo, would it be possible for a comment on this?

Also is there a reason why this would be auto closed with no interactions?

I know other questions have been suggested to moved to Stack Overflow, could instead the new Discussions tab be added and it moved to there instead, if you don’t think this is an Issue,

Thanks

AlexHedley avatar May 21 '20 20:05 AlexHedley

Sorry for the auto close. I enabled a bot that went through and closed issues that had no comments on it. Happy to reopen and take a look.

twsouthwick avatar May 22 '20 03:05 twsouthwick

@twsouthwick if you have some time that would be great, thanks.

AlexHedley avatar Jul 13 '20 10:07 AlexHedley

+1 for this - I am currently faced with a presentation containing lots of vector graphics that ends up allocating 15.000.000 objects in memory using the DOM approach. This basically makes our product unusable. I would also love to hear about workaround to avoid loading everything to memory before it is possible to do manipulation on elements.

sorensenmatias avatar Oct 05 '20 17:10 sorensenmatias

I'm going to close this in favor of #1193 since that has a proposal for this approach. Please reopen if you think it's different.

twsouthwick avatar Feb 03 '23 17:02 twsouthwick

@twsouthwick I'm not sure how reading the last element would allow for replacing an item anywhere in the document, unless I'm missing something?

AlexHedley avatar Feb 03 '23 18:02 AlexHedley

It may not solve your need - the title had SAX in it and was trying to consolidate efforts and issues. Sounds like different scenarios at play

twsouthwick avatar Feb 03 '23 20:02 twsouthwick

Finally, as I couldn't find a workaround for this bug, I gave a chance to the LargeXlsx library (https://github.com/salvois/LargeXlsx) and after running several tests in a development environment, I deployed the application to a production environment where I always received the OutOfMemory exception on Azure, and since then it has not occurred again despite generating Excel files with hundreds of thousands of records. The library is easy to use and the documentation is good.

In summary, due to the need to solve this problem, I had to look for alternatives, and fortunately, they were successful. If you have any questions, do not hesitate to ask. Regards.

rsdelapaz avatar Mar 03 '23 11:03 rsdelapaz