Open-XML-SDK
Open-XML-SDK copied to clipboard
Element classes do not contain all child element-related properties
Description
The DocumentFormat.OpenXml.Spreadsheet.Worksheet class only has properties for 4 out of 38 child elements. For example, there is no property for the only mandatory child element, x:sheetData, which is likely among the most frequently referenced elements.
As per the schema, those child elements have to be in a defined order, which, in the absence of SDK support, needs to be ensured by the caller's code. That increases the size and complexity of the code.
Information
- .NET Target: all
- DocumentFormat.OpenXml Version: 3cc26570f2964a5d56a5988f9464b2e4b29812c7
Observed
The Worksheet class defines the following child elements:
[ChildElementInfo(typeof(SheetProperties))]
[ChildElementInfo(typeof(SheetDimension))]
[ChildElementInfo(typeof(SheetViews))]
[ChildElementInfo(typeof(SheetFormatProperties))]
[ChildElementInfo(typeof(Columns))]
[ChildElementInfo(typeof(SheetData))]
[ChildElementInfo(typeof(SheetCalculationProperties))]
[ChildElementInfo(typeof(SheetProtection))]
[ChildElementInfo(typeof(ProtectedRanges))]
[ChildElementInfo(typeof(Scenarios))]
[ChildElementInfo(typeof(AutoFilter))]
[ChildElementInfo(typeof(SortState))]
[ChildElementInfo(typeof(DataConsolidate))]
[ChildElementInfo(typeof(CustomSheetViews))]
[ChildElementInfo(typeof(MergeCells))]
[ChildElementInfo(typeof(PhoneticProperties))]
[ChildElementInfo(typeof(ConditionalFormatting))]
[ChildElementInfo(typeof(DataValidations))]
[ChildElementInfo(typeof(Hyperlinks))]
[ChildElementInfo(typeof(PrintOptions))]
[ChildElementInfo(typeof(PageMargins))]
[ChildElementInfo(typeof(PageSetup))]
[ChildElementInfo(typeof(HeaderFooter))]
[ChildElementInfo(typeof(RowBreaks))]
[ChildElementInfo(typeof(ColumnBreaks))]
[ChildElementInfo(typeof(CustomProperties))]
[ChildElementInfo(typeof(CellWatches))]
[ChildElementInfo(typeof(IgnoredErrors))]
[ChildElementInfo(typeof(Drawing))]
[ChildElementInfo(typeof(LegacyDrawing))]
[ChildElementInfo(typeof(LegacyDrawingHeaderFooter))]
[ChildElementInfo(typeof(DrawingHeaderFooter))]
[ChildElementInfo(typeof(Picture))]
[ChildElementInfo(typeof(OleObjects))]
[ChildElementInfo(typeof(Controls))]
[ChildElementInfo(typeof(WebPublishItems))]
[ChildElementInfo(typeof(TableParts))]
[ChildElementInfo(typeof(WorksheetExtensionList))]
However, the Worksheet class only provides the following properties:
/// <summary>
/// <para> SheetProperties.</para>
/// <para> Represents the following element tag in the schema: x:sheetPr </para>
/// </summary>
/// <remark>
/// xmlns:x = http://schemas.openxmlformats.org/spreadsheetml/2006/main
/// </remark>
public SheetProperties SheetProperties
{
get => GetElement<SheetProperties>(0);
set => SetElement(0, value);
}
/// <summary>
/// <para> SheetDimension.</para>
/// <para> Represents the following element tag in the schema: x:dimension </para>
/// </summary>
/// <remark>
/// xmlns:x = http://schemas.openxmlformats.org/spreadsheetml/2006/main
/// </remark>
public SheetDimension SheetDimension
{
get => GetElement<SheetDimension>(1);
set => SetElement(1, value);
}
/// <summary>
/// <para> SheetViews.</para>
/// <para> Represents the following element tag in the schema: x:sheetViews </para>
/// </summary>
/// <remark>
/// xmlns:x = http://schemas.openxmlformats.org/spreadsheetml/2006/main
/// </remark>
public SheetViews SheetViews
{
get => GetElement<SheetViews>(2);
set => SetElement(2, value);
}
/// <summary>
/// <para> SheetFormatProperties.</para>
/// <para> Represents the following element tag in the schema: x:sheetFormatPr </para>
/// </summary>
/// <remark>
/// xmlns:x = http://schemas.openxmlformats.org/spreadsheetml/2006/main
/// </remark>
public SheetFormatProperties SheetFormatProperties
{
get => GetElement<SheetFormatProperties>(3);
set => SetElement(3, value);
}
Expected
Given that the child elements must be in a defined order, the Worksheet class defines properties for all child elements.
@twsouthwick I wouldn't say this is a bug, rather a missing feature. And the same is true for a good number of other classes.
The question is whether the SDK should be complete in the sense that it provides properties for all attributes, as well as child and parent relationships , so that you can properly navigate.
This should not only include relationships between OpenXmlElements but also part relationships that allow you to navigate back and forth between parent and child parts (e.g., between WorkbookPart and its WorksheetPart children. In the moment, I can't get from a WorksheetPart to its parent WorkbookPart, which would be nice as it could reduce the number of parameters I have to pass to methods requiring access to both.
Thanks @ThomasBarnekow . I agree this would be better labeled as an enhancement. I would love to see the SDK provide full support for navigation around a document and not require people to manually adjust XML, especially for simple tasks.
/cc @tomjebo @tarunchopra
@ThomasBarnekow I've been looking at cleaning up the schema generation and was trying to understand about this. I'm still trying to wrap my head around the actual OpenXml spec and not sure what you mean by the properties having an order. I'm looking at the ISO 29500 spec and not sure where it's indicating a specified order. Can you point me to that part of the spec?
Ah I see it in the actual schema with sequence vs all
@twsouthwick, this is also related to #627 and #620.
As part of the current release, I've been taking care of a ton of technical debt, including making the validation system more transparent. As part of this, I created a new data structure to store the validation info for child elements. For the example you started with, Worksheet, it looks like this:
private static readonly ParticleConstraint _constraint = new CompositeParticle(ParticleType.Sequence, 1, 1)
{
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.SheetProperties), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.SheetDimension), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.SheetViews), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.SheetFormatProperties), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.Columns), 0, 0),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.SheetData), 1, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.SheetCalculationProperties), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.SheetProtection), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.ProtectedRanges), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.Scenarios), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.AutoFilter), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.SortState), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.DataConsolidate), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.CustomSheetViews), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.MergeCells), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.PhoneticProperties), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.ConditionalFormatting), 0, 0),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.DataValidations), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.Hyperlinks), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.PrintOptions), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.PageMargins), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.PageSetup), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.HeaderFooter), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.RowBreaks), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.ColumnBreaks), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.CustomProperties), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.CellWatches), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.IgnoredErrors), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.Drawing), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.LegacyDrawing), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.LegacyDrawingHeaderFooter), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.DrawingHeaderFooter), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.Picture), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.OleObjects), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.Controls), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.WebPublishItems), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.TableParts), 0, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Spreadsheet.WorksheetExtensionList), 0, 1)
};
As you can see, the strongly typed properties stop at SheetFormatProperties as the next element is Columns which has maxOccurs = 0 which means infinite. The current way elements are parsed gets stuck on that infinite possibility and doesn't tackle it and ignores anything in the sequence beyond it. What we need to come up with is a way to handle that; there are more complicated examples, but I think Worksheet is a good starter.
I'm wondering if there's a way we can combine this particle dataset with the element properties. It would simplify a lot of the logic internally and potentially make it easier to see all elements. I'll keep thinking about that, as it may be the next thing feature wise to tackle.
Thoughts and feedback to API design usage would be helpful. I can foresee a couple options:
- Extend the current property system to expose the elements similar to how they are currently exposed.
- Expose a single, more robust,
ChildElementsproperty that can take a type and correctly insert it in order. This would be simpler as the generator (which is really difficult to work with) wouldn't be affected. This would look like the following:
openXmlElement.Children<Columns>.Add(new Column(...));
openXmlElement.Children<Columns>.Add(new Column(...)); // for multiple instances
openXmlElement.Children<SheetFormatProperties>.Set(new SheetFormatProperties(...)); // for single instances
A challenge with option 2 (which would be a lot easier to implement I believe), are types that have complicated sequences and options. An example of that would be UnsizedSplitButton where the constraints look like this:
private static readonly ParticleConstraint _constraint = new CompositeParticle(ParticleType.Sequence, 1, 1)
{
new CompositeParticle(ParticleType.Sequence, 0, 1)
{
new CompositeParticle(ParticleType.Choice, 0, 1)
{
new ElementParticle(typeof(DocumentFormat.OpenXml.Office.CustomUI.VisibleButton), 1, 1),
new ElementParticle(typeof(DocumentFormat.OpenXml.Office.CustomUI.VisibleToggleButton), 1, 1)
},
new ElementParticle(typeof(DocumentFormat.OpenXml.Office.CustomUI.UnsizedMenu), 1, 1)
}
};
For now, let's get a conversation started about how we want this to look.
@ThomasBarnekow I have a POC of what this may look like. The data structure #676 provides would allow for either of the options I listed above. For now, the change will introduce a new storage/retrieval mechanism so that current usage will start using that. This may have bugs, that I want to flesh out before supporting easy access to the rest of the elements (ie I don't want to break current behavior).
We'll probably start with (2) to test things out before fully exposing all the other properties. The current implementation is hard coded with indices, and this will break that so it's more flexible. However, (2) will allow us to gradually test it's behavior.
What I'm thinking currently is to expose the following internal methods:
TElement OpenXmlCompositeElement.GetElement<TElement>()bool OpenXmlCompositeElement.SetElement<TElement>(TElement element)IList<TElement> GetChildren<TElement>()
The last one I'm not sure about and would like to mark it in some way as experimental. It may be made public with a [EditorBrowsable(EditorBrowsableState.Never)], or with [Obsolete] - ideas welcome. Mainly, the goal is to have an API that is similar to AppendChild, but honors the correct order so the developer doesn't have to care or worry about it.
@twsouthwick Let me ask one question first to provide some food for thought. When you say that you want an API similar to AppendChild in conjunction with not wanting to break current behavior, does that also mean you don't want to fix "incorrect" behavior?
For example, while AppendChild does nothing wrong on a technical level, it makes it easy to do the wrong thing on an Open XML schema level. For example, when you want to add a Paragraph child element to a Body parent element that already has a SectionProperties child element, using AppendChild adds the Paragraph after the SectionProperties, leading to invalid markup.
Thus, why should AppendChild not do the right thing and add the Paragraph just before the SectionProperties as required by the WordprocessingML schema (see extract below; elements ordered for top-down reading).
<xsd:complexType name="CT_Body">
<xsd:sequence>
<xsd:group ref="EG_BlockLevelElts" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="sectPr" minOccurs="0" maxOccurs="1" type="CT_SectPr"/>
</xsd:sequence>
</xsd:complexType>
<xsd:group name="EG_BlockLevelElts">
<xsd:choice>
<xsd:group ref="EG_BlockLevelChunkElts" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="altChunk" type="CT_AltChunk" minOccurs="0" maxOccurs="unbounded"/>
</xsd:choice>
</xsd:group>
<xsd:group name="EG_BlockLevelChunkElts">
<xsd:choice>
<xsd:group ref="EG_ContentBlockContent" minOccurs="0" maxOccurs="unbounded"/>
</xsd:choice>
</xsd:group>
<xsd:group name="EG_ContentBlockContent">
<xsd:choice>
<xsd:element name="customXml" type="CT_CustomXmlBlock"/>
<xsd:element name="sdt" type="CT_SdtBlock"/>
<xsd:element name="p" type="CT_P" minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="tbl" type="CT_Tbl" minOccurs="0" maxOccurs="unbounded"/>
<xsd:group ref="EG_RunLevelElts" minOccurs="0" maxOccurs="unbounded"/>
</xsd:choice>
</xsd:group>
I am with you that we should not change AppendChild "just like that." However, we could think about modes of operation or opting into new behaviors. Most developers would likely expect the Open XML SDK to do the right thing.
Adding a separate API is, of course, a safe option. However, it will require good documentation, in particular for inexperienced developers.
@twsouthwick Regarding the internal methods you want to expose, I have only found the following two methods in OpenXmlCompositeElement:
private protected void SetElement<T>(int sequenceNumber, T newChild) where T : OpenXmlElement
private protected T GetElement<T>(int sequenceNumber) where T : OpenXmlElement
Here are some questions:
- Will you add methods without the
sequenceNumber? - What would be the difference between
GetElement<T>()andOpenXmlElement.GetFirstChild<T>()? - Apart from the return type, what would be the difference between
IList<TElement> GetChildren<TElement>()andIEnumerable<T> OpenXmlElement.Elements<T>()?
@ThomasBarnekow Thanks for your thoughts. That's actually a good argument about updating the AppendChild behavior. It could be hidden behind an AppContext switch if needed. Definitely worth exploring.
As for your other questions:
Will you add methods without the sequenceNumber?
This has been removed in the updated implementation. All information about where an element is located is retrieved from the composite particle information.
What would be the difference between GetElement<T>() and OpenXmlElement.GetFirstChild<T>()?
No difference. Probably just surface it through that (actually, the implementation is the same, I think). Good catch.
Apart from the return type, what would be the difference between IList<TElement> GetChildren<TElement>() and IEnumerable<T> OpenXmlElement.Elements<T>()?
I had envisioned it as a way to add additional children. However, I have reconsidered that with your comments. These additional APIs may not be needed and other, existing APIs could be used instead and just be updated to be more intelligent.
Stale issue message
At this point, I'm thinking about exposing the following API:
public class OpenXmlCompositeElement
{
public bool TryAppendChild(OpenXmlElement child, AppendOption option)
{
if(option == AppendOption.List)
{
AppendChild(child);
return true;
}
return SetElement(child);
}
}
public enum AppendOption
{
List,
Schema
}
This way, people could opt into the new behavior but have it configurable. Is this sufficient as a first step? Is this API useable?
@twsouthwick As requested in #774, here are my thoughts on the API above.
My initial thought was "yeah, this would do the job". Then I thought that calling it TryAppendChild might be (not entirely sure) a bit misleading because it is not simply the counterpart of AppendChild in the usual TryDoStuff pattern.
Would it be possible to use the following API (noting the optional parameter)?
public class OpenXmlCompositeElement
{
public bool AppendChild(OpenXmlElement child, AppendOption option = AppendOption.List)
{
if(option == AppendOption.List)
{
AppendChildCore(child); // The old AppendChild.
}
if (!SetElement(child))
{
throw new SomePertinentException("Some description");
}
}
public bool TryAppendChild(OpenXmlElement child, AppendOption option = AppendOption.List)
{
if(option == AppendOption.List)
{
AppendChild(child);
return true;
}
return SetElement(child);
}
}
public enum AppendOption
{
List,
Schema
}
I am not 100% certain that this will not be a breaking change of the AppendChild method. If we could not use optional parameters for AppendChild, we could turn the second parameter of AppendChild into a normal one and create an overload. Overall, we'd have the typical pair of AppendChild and TryAppendChild.
I'd be interested in your view on optional parameters, which I use a lot in my asynchronous APIs, e.g., for the CancellationToken or for an IProgress<string> parameter. With optional parameters and meaningful defaults, I don't have to create a ton of overloads.
I second Thomas on its proposal. This API is more natural.
Thanks for the thoughts!
Adding that to the AppendChild would be a breaking change (adding optional parameters are breaking as it changes the signature of the API). However, we could add an overload that takes an option in. Optional parameters are nice, but difficult to add at later points. If the full use case is known (or the API isn't in a shipping library), I definitely prefer them to minimize the overloading. v3.0 or something we could remove the overloads and just use the default parameters potentially (as a breaking change like that is valid for major versions under SemVer... although we haven't decided how we'll handle that beyond some of the obsoleting we've done).
The reason I went with the TryAppendChild is because it may not add it and hence the Try instead of just AppendChild, and then in the PR I did TryAddChild as (you correctly point out), it's not just a try-analog of the AppendChild. I wanted it to be clear that it may or may not succeed and is a new API. My thought then was to keep it a distinct API (kind of like how inserting/appending are distinct).
So, for completeness, here's the API as added in the PR:
public class OpenXmlCompositeElement
{
public bool TryAddChild(OpenXmlElement child)
{
return SetElement(child);
}
}
No overloads, no options, just a distinct API. I'm fine replacing it with what @ThomasBarnekow recommended, but those were the thoughts I had as I was implementing it.
For example, while
AppendChilddoes nothing wrong on a technical level, it makes it easy to do the wrong thing on an Open XML schema level. For example, when you want to add aParagraphchild element to aBodyparent element that already has aSectionPropertieschild element, usingAppendChildadds theParagraphafter theSectionProperties, leading to invalid markup.
I'm late to the game here, but this newbie got caught up by this.
My use case: I use a Word doc seeded with 35 styles as a template to create nicely-formatted cooking recipes read from a database. I copy this "empty" doc, and append the recipe text to the copy. The styles do all the formatting. It took me awhile to figure out that my paragraphs appended to the end of the copied doc were being added AFTER the SectionProperties already at the end of the document, and that's incorrect. Because I couldn't figure out how to insert new paragraphs BEFORE the existing SectionProperties, what I ended up doing is opening the copy, deleting all the children from the body, appending new parargraphs to my heart's content, then programmatically create and append the SectionProperties at the end. I had hoped to have the template define those section properties rather than code, but it worked.
@twsouthwick, just checking whether this issue is still open. Otherwise, we might want to close it.