odftoolkit icon indicating copy to clipboard operation
odftoolkit copied to clipboard

Enabling FormatCodeTests

Open svanteschubert opened this issue 7 months ago • 5 comments

Enabling FormatCodeTests and created follow-up issues for the remaining problems, i.e. #370, #371 and by updating a test file (using different dates for different formatting to be certain to have the correct cell) and using pretty printing XML more assertions had to be disabled due to #229

What has been changed:

  1. First of all, I realised that the output was invalid, but so was the input from IBM Lotus Symphony, released October 2010, which omitted the version attribute in the manifest, created #364 to fix invalid files (at least in this case) by adding a version.
    then I had to (re)learn this formatting language for spreadsheet cells, which seems to be implementation-dependent: image see https://docs.oasis-open.org/office/OpenDocument/v1.4/OpenDocument-v1.4-part4-formula.html#TEXT But the ODF implementations describe this formatting language:
  • EXCEL: https://support.microsoft.com/en-us/office/review-guidelines-for-customizing-a-number-format-c0a1d1fa-d3f4-4018-96b7-9c9354dd99f5
  • LibreOffice: https://help.libreoffice.org/24.8/en-GB/text/shared/01/05020301.html?&DbPAR=CALC&System=WIN I still could paste the Excel String "[Blue]#,##0.00_);Red;0.00;"sales "@" in the latest LibreOffice as custom formatting and it worked, so they are not so far away.
    After reading, I realised that the example #0.0 does not make any sense at all, so I exchanged it with #.0
  1. Second, the formats that I received from LibreOffice differ from the expected ones from the tests, so I simply resaved the document and took the date formatting from the file, which looked good, and changed the dates to be able to prove which cell was taken by the content.

  2. Third, I made some Nullpointer sanity checks, but I avoided changing the API, like sometimes an InvalidArgumentException is thrown, when the cell is not of the time type, but if not set, Null is returned. 

svanteschubert avatar May 13 '25 12:05 svanteschubert

Ah, I thought this was only about the format strings. I have changed the code to not use the obsolete pre-Java 8 time classes and the not threadsafe SimpleDateFormatter. The java.time classes already use ISO 8601 out of the box when parsing or calling toString(), so this all gets a lot easier. When using custom format strings, the DateTimeFormatter accepts the same formats as SimpleDateFormatter (plus some added fields), but it is more picky and will refuse incorrect formats that SimpleDateFormatter will silently process.

xzel23 avatar May 14 '25 09:05 xzel23

@xzel23 While I had been waiting for the review for the enabling the format tests, the time refactoring causing some trouble. Fixed one as part of my pan-ultimate commit, but there is one more issue: testSetGetFormat() throws Exception - Unsupported field: MonthOfYear

The test calls seem so uncritical: cell = tablerow.getCellByIndex(3); cell.setTimeValue(currenttime); cell.setFormatString("HH:MM:SS");

Others had the problem as well, but I was not able to fix it quickly and will return latest Wednesday noon to my desk: Could you please take a look at it, Axel?

Unsupported field: MonthOfYear java.time.temporal.UnsupportedTemporalTypeException at java.base/java.time.LocalTime.get0(LocalTime.java:710) at java.base/java.time.LocalTime.getLong(LocalTime.java:688) at java.base/java.time.format.DateTimePrintContext.getValue(DateTimePrintContext.java:308) at java.base/java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2914) at java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2529) at java.base/java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1905) at java.base/java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1879) at java.base/java.time.LocalTime.format(LocalTime.java:1444) at org.odftoolkit.odfdom.doc.table.OdfTableCell.setCellFormatString(OdfTableCell.java:1620) at org.odftoolkit.odfdom.doc.table.OdfTableCell.setFormatString(OdfTableCell.java:1581) at org.odftoolkit.odfdom.doc.table.TableCellTest.testSetGetFormat(TableCellTest.java:278)

PS: Could you please take a look at it and work on this PR?

svanteschubert avatar May 16 '25 11:05 svanteschubert

The problem is the 'M'. Both SimpleDateFormat (pre-Java 8) and DateTimeFormatter (java.time) interpret this as 'Month of year', and there is no month in a time (without a date). For minutes, 'm' has to be used for both formatting classes. Also, 'S' is for milliseconds, so HH:MM:SS should really be HH:mm:ss. So why is it not all lowercase? Because h is for the hour (1-12) in am/pm time.

I'll have a look at this PR later.

xzel23 avatar May 18 '25 09:05 xzel23

Let's summarise a bit on the date patterns we have been discussing.

We are an ODF library, our ground truth is the ODF specification. Unfortunately, the Date patterns are not part of the ODF specification - reusing simply the ISO standard 8601 standard - but every ODF application can use its date patterns, see : image see https://docs.oasis-open.org/office/OpenDocument/v1.4/OpenDocument-v1.4-part4-formula.html#TEXT

These patterns are named in the ODF specification FormatCode, which is a sequence of characters with an implementation-defined meaning. Unfortunately, the ISO 8601 standard is usually behind a pay wall, but there is a PDF draft of working group 5 available from 2016. But an implementation of this standard might be - hard to prove, not having the spec - the Java implementation available since JDK 8: See https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#patterns

Nevertheless, in the file formats - within the XML - only the "FormatCode" of an ODF application will be found and must be translated back and forth from their implementation-defined to the JDK class.

In this issue, we might start making the existing code, which relies on IBM Lotus Symphony (based on OpenOffice.org), run with LibreOffice: The FormatCode of LibreOffice is defined in its online help: https://help.libreoffice.org/24.8/en-GB/text/shared/01/05020301.html?&DbPAR=CALC&System=WIN

When all @Ignore are being removed from https://github.com/tdf/odftoolkit/blob/master/odfdom/src/test/java/org/odftoolkit/odfdom/doc/table/TableCellTest.java The problems become apparent, in addition to the text normalisation #229, which I will focus on now. @xzel23 Would you like to give this format code fix a try, as it was related to the time/date refactorings you did earlier?

If anyone likes to provide MS Office ODF FormatCode support, any PR would be welcome! :-)

svanteschubert avatar May 21 '25 12:05 svanteschubert

Hi @svanteschubert, yes, I wanted to do that anyway. My suggestion is to make the get- and set-methods accept the LibreOffice format strings, but also add alternative methods, maybe setIsoDateFormat() or setJavaDateFormat(), that accept the Java SimpleDateFormatter/DateTimeFormatter ISO compliant codes, because that's what Java developers are used to and many have in their code base.

PS: The symbols to be used in format strings are defined in the first part of the standard ("ISO 8601-1:2019(en)"), under "3.2.4 Symbols used in place of digits or signs". While you need to pay (IMHO too much) money to obtain a copy of the standard, that section is available free to read when you click the "Read Sample" button below the rendered page on the left of the ISO 8601 page.

xzel23 avatar May 21 '25 13:05 xzel23