NanoXLSX icon indicating copy to clipboard operation
NanoXLSX copied to clipboard

CustomFormatCode - Backslash to escape special format characters

Open SebZar opened this issue 3 years ago • 5 comments
trafficstars

Hi, when creating a custom format, there is a problem with the escaped characters.

Also take a look at the count attribute of numFmts. Its 3 and not 2, as expected.

` style.CurrentNumberFormat.CustomFormatCode = "yyyy-mm-dd HH:mm:ss";

style.CurrentNumberFormat.CustomFormatCode = @"#,##0\ \s"; `

style.xml result:

...
<numFmts count="3">
	<numFmt formatCode="yyyy-mm-dd HH:mm:ss" numFmtId="164"/>
	<numFmt formatCode="#,##0\\ \\s" numFmtId="165"/>
</numFmts>
...

style.xml corrected with Microsoft Excel:

...
<numFmts count="2">
	<numFmt numFmtId="164" formatCode="yyyy\-mm\-dd\ hh:mm:ss"/>
	<numFmt numFmtId="166" formatCode="#,##0\ \s"/>
</numFmts>
...

SebZar avatar Nov 03 '22 19:11 SebZar

Hi, Thank you for the report. It looks like something has changed in the format in the last few weeks, since the current escaping behavior was implemented due to a discovered bug in September. This is already the third issue with internal changes since the Office Update in October. I have first to check the (now) correct escaping behavior.

However, I think the second one "Its 3 and not 2, as expected" is not a bug.

style.CurrentNumberFormat.CustomFormatCode = "yyyy-mm-dd HH:mm:ss";
style.CurrentNumberFormat.CustomFormatCode = @"#,##0\ \s";

You are just overwriting the CustomFormatCode of the same Style instance. You have to create a new style object to add another custom format style.

To fix the escaping will take some days, since I have already several changes to implement and test

rabanti-github avatar Nov 03 '22 20:11 rabanti-github

Hi, I am still working on a solution for the escaping. Not sure how to provide a holistic and error-proof escaping at the moment. Especially due to internationalized date/time formatting, there are dozens or even hundreds (when it comes to valid code references) of particular rules for custom number formats. In the worst case, I have to remove the complete escaping (except XML escaping) for now and the API users have to figure it out according to ECMA OOXML Spec, Part 1, chapter 18.8.31, by themselves. Not a good solution. It will take some additional days to evaluate the whole issue and a good solution.

rabanti-github avatar Nov 06 '22 01:11 rabanti-github

Hi, no hurry.

Perhaps this helps to find a solution:

NPOI (Apache-2.0 license) https://github.com/nissl-lab/npoi/blob/37a8435dc4d613d9cf6145d044e4bf28bdfc3e4e/OpenXmlFormats/Spreadsheet/Styles/CT_NumFmt.cs

EPPlus 4.5 (LGPL) https://github.com/JanKallman/EPPlus/blob/55c5ba6169eaa6ae7867fefb36a84cc0b28be85a/EPPlus/Style/XmlAccess/ExcelNumberFormatXml.cs

ExcelNumberFormat (MIT) https://github.com/andersnm/ExcelNumberFormat

exceljs (MIT) https://github.com/exceljs/exceljs/blob/2ab468b8bae2c8d5844f922d61ac7d960fee26be/lib/xlsx/xform/style/numfmt-xform.js

Writing the format only with XML escaping would be okay for me.

SebZar avatar Nov 08 '22 12:11 SebZar

Hi, Thank you for the additional input. I just released v2.1.0, where the whole escaping (except the XML escaping) was removed. I am quite tired today, therefore I will post an in-depth explanation, what I have done - and why, tomorrow.

rabanti-github avatar Nov 08 '22 22:11 rabanti-github

Ok, here is the follow up.

Analysis

I had a deep dive into custom number formatting. And this is what happens internally:

  • Most characters like x, l or z are escaped by a Backslash --> \x\l\z
  • Values between quotes are not escaped --> "xlz"
  • aaa is not escaped --> aaa
  • A space is escaped as well --> \
  • / is invalid, but with a backslash it's OK --> \/
  • e and E are not escaped since they are used for scientific notation --> e+000 or E-0000
  • mm is not escaped (among other) since it is used to format minutes --> mm
  • jjjj is automatically transformed to yyyy in my German Excel and not escaped since it is used to format a year --> yyyy
  • mmmmm is not escaped since it describes a Month --> mmmmm
  • mmmmmm is automatically fixed to mmmmm by Excel. Don't know what would happen in code. Would the additional m be escaped? Is it invalid? --> mmmmm
  • n seems to be invalid in any case. Don't know even why
  • \, is automatically fixed (unescaped) by Excel to , --> ,
  • [<=100] becomes automatically [<=100]General by Excel --> [<=100]General
  • AM/PM is not escaped since it is used to define a time --> AM/PM
  • Anything within brackets after a $ is not escaped and not checked --> [$RAVEN]
  • こんいちは is escaped on every character --> \こ\ん\い\ち\は
  • 上午 is not escaped since it describes a time --> 上午
  • ปปปป is not escaped since it describes a date or time (?) --> ปปปป
  • b is fixed to bb by Excel and not escaped (is it for date or time?) --> bb

... and so on. I think you see the point. Especially the date and time formatting becomes quite complicated when all possible locales are considered.

Actions

Due to this analysis, and especially due to the localization issues, I decided to remove the whole escaping from the library for now. It would be probably just a very bad and incomplete, or impossible to maintain solution. Therefore, the API users have to figure it out by themselves for the moment :-(

The mentioned libraries may be a solution. However, I try to avoid dependencies as much as possible. This may change. But for now it is what it is. Furthermore, some of them seems also not to be feature complete (Asian date / times). Nevertheless, these libs may be currently a good solution for the API users

What's next?

It is planned to add a format builder in the next major version of the library as a plugin. With this builder (similar to the planned style builder), the most common cases of number formats could be covered. Furthermore there would be still the possibility to override a not yet covered format. However, I cannot exactly say, when this major version is released. There is still a lot of work to do

rabanti-github avatar Nov 09 '22 15:11 rabanti-github