Open-XML-SDK
Open-XML-SDK copied to clipboard
Misc element properties are not lining up with their ECMA definitions
Before submitting an issue, please fill this out
Is this a:
- [X] Issue with the OpenXml library
- [ ] Question on library usage
Description
The DocumentFormat.OpenXml.WordProcessing.OnOffOnlyValues enum only contains values On and Off but if I am reading the ECMA-376 standard correctly, properties with this value type may contain values On, Off, 0, 1, true, and false. When values, such as true or false, are present for an xml attribute for this type, the Value property of the EnumValue<OnOffOnlyType> object throws a System.FormatException exception.
The attached document contains an element /document[0]/tbl[0]/tr[0]/trPr[0]/cantSplit which contains the property in question.
file-sample_500kB.docx
I'm open to discuss this further but it doesn't seem like this should be the correct behavior.
Information
- .NET Target: .NET Framework 4.7.2
- DocumentFormat.OpenXml Version: 2.11.3
Repro
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System;
using System.IO;
using System.Linq;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
// Change to path of test file
const string pathToFile = @"C:\path\to\file-sample_500kB.docx";
using (var f = new FileStream(pathToFile, FileMode.Open))
{
using (var docx = WordprocessingDocument.Open(f, false))
{
var sample = docx.MainDocumentPart.Document.Descendants<CantSplit>().FirstOrDefault();
if (sample != null)
{
var val = sample.Val;
Console.WriteLine($"Sample has value: {val.HasValue}");
Console.WriteLine($"Sample inner text: {val.InnerText}");
try
{
Console.WriteLine($"Sample value: {val.Value}");
}
catch (Exception x)
{
Console.WriteLine($"Sample value threw {x.GetType().Name}");
Console.WriteLine($"Sample value exception message: {x.Message}");
}
}
}
}
Console.WriteLine("Press any key to quit");
Console.ReadKey(true);
}
}
}
Observed
Sample has value: False
Sample inner text: false
Sample value threw FormatException
Sample value exception message: The text value is not a valid enumeration value.
Press any key to quit
Expected
Sample has value: True
Sample inner text: false
Sample value: DocumentFormat.OpenXml.Wordprocessing.OnOffOnlyValues.False or DocumentFormat.OpenXml.Wordprocessing.OnOffOnlyValues.Off
Press any key to quit
FYI - I caught this while testing out the Serialize.OpenXml.CodeGen library.
Interesting. @tomjebo can you take a look at the spec and weigh in here?
@rmboggs Can you share the section of the spec you were looking at?
I was referencing the first revision of the open xml ecma standard located here. Pdf Page 308 - cantSplit definition which references the possible values defined by the ST_OnOff simple type Pdf Page 1786 - ST_OnOff possible values which include the values I listed in the initial issue report above.
Please let me know if there are more questions.
Actually, I dug a bit deeper here as there are other elements that also have the same possible values. DocumentFormat.OpenXml.WordProcessing.Caps for example. It has the same set of possible values as the DocumentFormat.OpenXml.WordProcessing.CantSplit but they inherit from different base classes. Caps inherits from DocumentFormat.OpenXml.Wordprocessing.OnOffType while CantSplit inherits from DocumentFormat.OpenXml.Wordprocessing.OnOffOnlyType. Since it appears that OnOffType offers the appropriate values for this kind of element, I believe that changing CantSplit to inherit from OnOffType would solve this issue rather than rewrite the OnOffOnlyType class. I'm hoping this won't be a shocking/breaking change.
My gut says that's a breaking change. I'll dig into it too see. We'll probably do a v3 soon and can handle any breaking changes then
It makes sense to be cautious. I don't think this is urgent since the cantSplit class has been set this way for quite some time and the difference is being noticed now, in a non production environment. I have an idea how to get around this in my project so I should be ok. In the meantime, would it be ok to label this for the 3.0 release so it doesn't get lost.
@twsouthwick @rmboggs I think we have a mistake in the schema processing (backend). cantSplit looks like it was defined in our backend processing as CT_OnOffOnly which is not correct based on the ISO definition or what I see in the Office source schemas. I see it actually associated with both CT_OnOffOnly and CT_OnOff in our backend code and the former looks like it's overriding the latter. I think this is wrong. I'm not sure if correcting this to CT_OnOff would be breaking as it would be expanding the possible values, not taking any away. I'll need to check the other elements that are set to this type as well.
Thanks @tomjebo. Please let me know if you need more information on this.
This is definitely a v3.0 change as it will be a breaking change. We'll track this and get that in.
Thanks guys, I have a work around for my stuff in the meantime.
It's not the prettiest work around but it will do.
Hi,
I'm finding other non-enumvalue type properties that do not seem to be matching up to their values in some of the sample documents that I am testing against. Since they seem to be related to schema mismatches at first glance similar to this, should I append what I find to this issue or create a new one to keep things clean? Please let me know.
Yes, let's make this an uber-issue for those kind of issues. Then we can track it for v3.0
Ok, let me see if I can modify the title to be more broad in a bit.
In the meantime, I'm seeing more inconsistencies in the sample docx file mentioned at the beginning of this issue when compared to the ecma standard. First one is for the DocGrid.CharacterSpace property. The property type is set to Int32Value but the sample document attached has 4294961151 as its value in the SectionProperties element, which is out of range for that type.. When I check the ecma standard, it says the values for this type should be a positive/negative decimal number (defined in 2.18.16 ST_DecimalNumber (Decimal Number Value)) which is pretty vague in terms of what the minimum/maximum values should be. On the flip side, DivId.Val, which has the same definition as the DocGrid.CharacterSpace property in the ecma standard (2.18.16 ST_DecimalNumber (Decimal Number Value)), is setup as a StringValue type in the SDK. I have yet to check out the actual code for these two properties but if they are generated the same way as the CantSplit type, then there is another issue with the generation process. Chances are that the disconnect between the two properties is due to the ambiguity of the ecma schema definition but it should be looked at, imho.
Please let me know if more details are needed for this.