Open-XML-SDK icon indicating copy to clipboard operation
Open-XML-SDK copied to clipboard

DocumentType Strict does not exist

Open Asbjoedt opened this issue 3 years ago • 6 comments

Description

DocumentType Strict does not exist if you want to change document type. See available options for changing document types for spreadsheets:

https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheetdocumenttype?view=openxml-2.8.1

Information

  • DocumentFormat.OpenXml Version: (ie 2.7.2)

Repro

// Please add a self-contained, minimum viable repro of the issue.
// If you require external resources, please provide a gist or GitHub repro
// An Xunit style test is preferred, but a console application would work too.

Observed

No Strict DocumentType exist for changing DocumentType.

Expected

You can save files as Strict in the file type dropdown menu in Office, but you cannot select Strict as a DocumentType in the SDK.

Asbjoedt avatar Jul 22 '22 12:07 Asbjoedt

Hi @Asbjoedt ,

What are you trying to do? Are you trying to create documents with the SDK and set them as strict mode? If you are trying to create documents with the SDK, could you post the code you are trying?

mikeebowen avatar Jul 27 '22 22:07 mikeebowen

Hi @mikeebowen

Are you trying to create documents with the SDK and set them as strict mode?

Yes! I asked the same question on Stackoverflow 3 months ago, but I have received no replies: https://stackoverflow.com/questions/71940270/how-to-programmatically-create-ooxml-strict-documents

If you are trying to create documents with the SDK, could you post the code you are trying? I have tried two codes.

Code 1. Convert to DocumentType Workbook and do nothing else. "ChangeDocumentType" -> "Workbook" defaults to Transitional conformance, which is fine. But how to make it Strict?

using using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

byte[] byteArray = File.ReadAllBytes(input_filepath);
            using (MemoryStream stream = new MemoryStream())
            {
                stream.Write(byteArray, 0, (int)byteArray.Length);
                using (SpreadsheetDocument spreadsheetDoc = SpreadsheetDocument.Open(stream, true))
                {
                    spreadsheetDoc.ChangeDocumentType(SpreadsheetDocumentType.Workbook);
                }
                File.WriteAllBytes(output_filepath, stream.ToArray());
            }

Code 2. Convert using SaveAs(). This (makes sense) also defaults to Transitional (however when you input a Strict file to convert it to Transitional, the new file will create an OOXML validator error related to namespaces (I will test this bug again and create a new issue for it, if it is repro))

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

                using (SpreadsheetDocument spreadsheetDoc = SpreadsheetDocument.Open(input_filepath, true))
                {
                    SpreadsheetDoc.SaveAs(output_filepath)
                }

What do you do to transform a Transitional document preferably a spreadsheet from Transitional to Strict conformance? In Office it is as simple as selecting Strict file format. In SDK, this option does not exist, and any conformance transform would have to be coded manually. In Office Strict conformance can only be selected for non-macro and non-template file formats meaning (docx, xlsx, pptx) and not e.g. xltx, xlsm etc. These should have separate "ChangeDocumentType" options also, or another way of specifying conformance when using "ChangeDocumentType".

Does this make sense?

It is a big issue for us as an archive, that we do not have a programmatic way to transform Transtional to Strict conformance (I am not a programmer, but I have not found any libraries or programs that can help us do this).

Thanks for listening.

Asbjoedt avatar Aug 01 '22 09:08 Asbjoedt

Hi @Asbjoedt ,

Unfortunately, the StrictRelationshipFound property is read only, so it can be used to create a report, but it can't edit the attribute to make it strict. I'm not sure if it's possible another way, but I will look into it.

mikeebowen avatar Aug 03 '22 18:08 mikeebowen

Hi again

Regarding spreadsheets.

Changing conformance class can be done with:

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

using (var spreadsheet = SpreadsheetDocument.Open(input_filepath, true))
{
    Workbook workbook = spreadsheet.WorkbookPart.Workbook;
    if (workbook.Conformance == null || workbook.Conformance != "strict")
    {
        workbook.Conformance.Value = ConformanceClass.Enumstrict;
    }
}

Changing from Transitional to Strict involves several other things. Importantly, it involves changing many namespaces. Here's a non-exhaustive (incomplete) list of namespaces for Strict, which for every single one must be changed in the spreadsheet.

http://purl.oclc.org/ooxml/spreadsheetml/main
http://purl.oclc.org/ooxml/officeDocument/relationships
http://purl.oclc.org/ooxml/officeDocument/extendedProperties
http://purl.oclc.org/ooxml/officeDocument/docPropsVTypes
http://purl.oclc.org/ooxml/officeDocument/relationships/styles
http://purl.oclc.org/ooxml/officeDocument/relationships/theme
http://purl.oclc.org/ooxml/officeDocument/relationships/worksheet
http://purl.oclc.org/ooxml/officeDocument/relationships/sharedStrings
http://purl.oclc.org/ooxml/officeDocument/relationships/externalLink
http://purl.oclc.org/ooxml/officeDocument/relationships/officeDocument
http://purl.oclc.org/ooxml/officeDocument/relationships/externalLinkPath
http://purl.oclc.org/ooxml/officeDocument/relationships/oleObject
http://purl.oclc.org/ooxml/drawingml/main
http://purl.oclc.org/ooxml/drawingml/spreadsheetDrawing
urn:schemas-microsoft-com:vml

I am trying to write my own code to amend all these namespaces but it is a manual and time-consuming process.

Changing from Transitional to Strict also means not accepting the old drawing VML but instead accepting DrawingML and not accepting certain Japanese characters.

Basically my company needs an easy way to change from Transitional to Strict. It can be done with Excel Interop dependency using this code, but we want to use a self-dependent application only using Open XML SDK:

Excel.Application app = new Excel.Application(); // Create Excel object instance
Excel.Workbook wb = app.Workbooks.Open(input_filepath); // Create workbook instance
wb.SaveAs(output_filepath, 61); // Save workbook as .xlsx Strict
wb.Close(); // Close workbook
app.Quit(); // Quit Excel application

If Open XML SDK could support an easy way to do all of the above-mentioned changes all at once with one or a few lines of code, this would meet an important business requirement for long-term archiving/preservation of spreadsheets.

Asbjoedt avatar Sep 06 '22 10:09 Asbjoedt

There doesn't seem to be anything specifically available to do this OOB. However, I bet you could customize the IOpenXmlNamespaceResolver to do this for you. This is an internal type, so you'll need to clone the repo to try it out.

Some pseudo code that will most definitely not compile and should only set you on the right track, would be:

using var doc = SpreadsheetDocument.Open(...);
doc.ForceStrictNamespaces();

doc.Save(...)


public static void ForceStrictNamespaces(this OpenXmlPackage package)
{
  var existing = doc.Features.Get<IOpenXmlnamespaceResolver>());
  doc.Features.Set<IOpenXmlNamespaceResolver>(new ForceStrictNamespace(existing));
}

class ForceStrictNamespace : IOpenXmlNamespaceResolver
{
  public ForceStrictNamespace(IOpenXmlNamespaceResolver other)
  { 
    ...
  }

  ... // Implement it to give back that strict namespaces
}

Happy to take a PR that enables this :) Please let us know any design proposals before pushing a PR

twsouthwick avatar Sep 07 '22 02:09 twsouthwick

Hi @twsouthwick

Thanks for your reply and the guidance in how to move forward. I cloned the repo and looked at this code briefly. I don't think I can deliver any code to you in a level acceptable, because I have only learned to program recently and these concepts with Interfaces is still difficult for me to write.

I can propose the following code (of which some is pseudo-code because I need you to point me in the right direction).

I propose to make a list of items each with prefix, transitional namespace and strict namespace. This is a kind of dictionary. I show the code below.

public class namespaceIndex
{
    public string Prefix { get; set; }

    public string Transitional { get; set; }

    public string Strict { get; set; }

    public static List<namespaceIndex> Create_Namespaces_Index()
    {
        List<namespaceIndex> list = new List<namespaceIndex>();

        // xmlns
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/spreadsheetml/2006/main", Strict = "http://purl.oclc.org/ooxml/spreadsheetml/main" });
        // docProps/
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/extended-properties", Strict = "http://purl.oclc.org/ooxml/officeDocument/extendedProperties" });
        // docProps/vt
        list.Add(new namespaceIndex() { Prefix = "vt", Transitional = "http://purl.oclc.org/ooxml/officeDocument/docPropsVTypes", Strict = "http://purl.oclc.org/ooxml/officeDocument/docPropsVTypes" });
        // relationships/r
        list.Add(new namespaceIndex() { Prefix = "r", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships" });
        // relationship/styles
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/styles" });
        // relationship/theme
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/theme" });
        // relationship/worksheet
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/worksheet" });
        // relationship/sharedstrings
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/sharedStrings" });
        // relationship/externalLink
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLink", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/externalLink" });
        // relationship/officeDocument
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/officeDocument" });
        // relationship/externallink/externalLinkPath
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLinkPath", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/externalLinkPath" });
        // relationship/oleObject
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/oleObject" });
        // relationship/image
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/image", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/image" });
        // relationship/video
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/video", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/video" });
        // relationship/pivotCacheDefininition
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/pivotCacheDefinition", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/pivotCacheDefinition" });
        // relationship/pivotCache Records
        list.Add(new namespaceIndex() { Prefix = "", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/pivotCacheRecords", Strict = "http://purl.oclc.org/ooxml/officeDocument/relationships/pivotCacheRecords" });
        // drawingml/a
        list.Add(new namespaceIndex() { Prefix = "a", Transitional = "http://schemas.openxmlformats.org/drawingml/2006/main", Strict = "http://purl.oclc.org/ooxml/drawingml/main" });
        // drawingml/xdr
        list.Add(new namespaceIndex() { Prefix = "xdr", Transitional = "http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing", Strict = "http://purl.oclc.org/ooxml/drawingml/spreadsheetDrawing" });
        // drawingml/chart
        list.Add(new namespaceIndex() { Prefix = "c", Transitional = "http://schemas.openxmlformats.org/drawingml/2006/chart", Strict = "http://purl.oclc.org/ooxml/drawingml/chart" });
        // customXml/ds
        list.Add(new namespaceIndex() { Prefix = "ds", Transitional = "http://schemas.openxmlformats.org/officeDocument/2006/customXml", Strict = "" });
        // urn for Strict - NO NAMESPACE FOR TRANSITIONAL
        list.Add(new namespaceIndex() { Prefix = "v", Transitional = "", Strict = "urn:schemas-microsoft-com:vml" });
        // docProps/core.xml - NO NAMESPACE FOR TRANSITIONAL
        list.Add(new namespaceIndex() { Prefix = "dc", Transitional = "", Strict = "http://purl.org/dc/elements/1.1/" });
        // docProps/core.xml - NO NAMESPACE FOR TRANSITIONAL
        list.Add(new namespaceIndex() { Prefix = "dcterms", Transitional = "", Strict = "http://purl.org/dc/terms/" });
        // docProps/core.xml - NO NAMESPACE FOR TRANSITIONAL
        list.Add(new namespaceIndex() { Prefix = "dcmitype", Transitional = "", Strict = "http://purl.org/dc/dcmitype/" });

        return list;
    }
}

I then propose to make a method that can open a spreadsheet, register all existing namespaces in a dictionary and then search in our above-mentioned list of namespaces for any namespace to convert to.

I show the code below. However, two components are pseudo code.

public void Convert_Transitional_to_Strict(string input_filepath)
{
    // Create list of namespaces
    List<namespaceIndex> namespaces = namespaceIndex.Create_Namespaces_Index();

    using (var spreadsheet = SpreadsheetDocument.Open(input_filepath, true))
    {
        WorkbookPart wbPart = spreadsheet.WorkbookPart;
        DocumentFormat.OpenXml.Spreadsheet.Workbook workbook = wbPart.Workbook;
        // If Transitional
        if (workbook.Conformance == null || workbook.Conformance != "strict")
        {
            // Create dictionary of existing namespaces
            Dictionary<string, string> existingNamespaces = spreadsheet
                .GetAllParts()
                .NamespaceDeclarations
                .ToDictionary();

            // Change conformance class
            workbook.Conformance.Value = ConformanceClass.Enumstrict;

            // Add vml urn namespace to workbook.xml
            workbook.AddNamespaceDeclaration("v", "urn:schemas-microsoft-com:vml");

            // Change namespaces
            foreach (KeyValuePair<string, string> existingNamespace in existingNamespaces)
            {
                foreach (namespaceIndex name in namespaces)
                {
                    if (name.Transitional == existingNamespace.Value)
                    {
                        existingNamespace.Value = name.Strict;
                    }
                }
            }
        }
    }
}

The below part is pseudo code. I don't know any way to search for all namespaces in a spreadsheet and register them to a dictionary. Using LINQ could be the way to move forward. Do you have any suggestion for me how to code a dictionary with all namespaces in a spreadsheet?

// Create dictionary of existing namespaces
Dictionary<string, string> existingNamespaces = spreadsheet
    .GetAllParts()
    .NamespaceDeclarations
    .ToDictionary();

Secondly, the below is also pseudo code. I need a simple way to get and set any namespace registered in the dictionary of existing namespaces in the spreadsheet. I know of .AddNamespaceDeclations() and .RemoveNamespaceDeclarations(), but these must be hardcoded for each namespace, you want to change. I need a more generic way to do this in a foreach loop.

if (name.Transitional == existingNamespace.Value)
{
    existingNamespace.Value = name.Strict;
}

Are these two pseudo code snippets, something that is possible to do with Open XML SDK and can you point me in the right direction?

Asbjoedt avatar Sep 10 '22 10:09 Asbjoedt