ClosedXML icon indicating copy to clipboard operation
ClosedXML copied to clipboard

The handling of floating point roundoff error can lead to equality checks breaking

Open Jonathan--V opened this issue 1 year ago • 1 comments

Read and complete the full issue template

Do not randomly delete sections. They are here for a reason.

Do you want to request a feature or report a bug?

  • [x] Bug
  • [ ] Feature
  • [ ] Question

Did you test against the latest CI build?

  • [ ] Yes
  • [x] No

If you answered No, please test with the latest development build first.

Version of ClosedXML

0.105.0-rc

What is the current behavior?

Evaluating certain sums (such as -50000+36123.95) in ClosedXML results in a tiny roundoff error (in this case, -13876.050000000003). This then causes an Excel formula that checks for equality between the ClosedXML calculation and the correct value to fail. For example, "-50000+36123.95=-13876.05" evaluates to FALSE. It also causes the wrong number format section to be used when the ClosedXML calculated result is subtracted from the correct value (for example, ClosedXML might use the format for negative numbers rather than the format for zero values as Excel correctly would).

What is the expected behavior or new feature?

Evaluating sums such as -50000+36123.95 should result in the exact value of -13876.05, without introducing a roundoff error. This will ensure that equality works correctly, as in Excel.

Is this a regression from the previous version?

No. The issue is also present in 0.104.2.

Reproducibility

This is an important section. Read it carefully. Failure to do so will cause a 'RTFM' comment.

Without a code sample, it is unlikely that your issue will get attention. Don't be lazy. Do the effort and assist the developers to reproduce your problem. Code samples should be minimal complete and verifiable. Sample spreadsheets should be attached whenever applicable. Remove sensitive information.

Code to reproduce problem:

public static void Main()
{
    Debug.WriteLine($"Version: {typeof(IXLWorkbook).Assembly.GetName().Version}");

    // The following line is the core of the issue - the rest is just a demonstration of how it breaks things.
    // Results in "-13876.050000000003"
    Debug.WriteLine(XLWorkbook.EvaluateExpr("-50000+36123.95").GetNumber().ToString("R17"));


    string filename = @"C:\ClosedXML and ExcelNumberFormat\Incorrect Rounding.xlsx";
    using (XLWorkbook workbook = new XLWorkbook(filename))
    {
        IXLWorksheet worksheet = workbook.Worksheet("Sheet1");
       
        IXLCell a2 = worksheet.Cell("A2");
        Debug.WriteLine($"Element 1 Value ({a2.Address}): {a2.Value.GetNumber():R17}");
        Debug.WriteLine($"Element 1 Formatted String ({a2.Address}): {a2.GetFormattedString()}");

        IXLCell b2 = worksheet.Cell("B2");
        Debug.WriteLine($"Element 2 Value ({b2.Address}): {b2.Value.GetNumber():R17}");
        Debug.WriteLine($"Element 2 Formatted String ({b2.Address}): {b2.GetFormattedString()}");

        IXLCell c2 = worksheet.Cell("C2");
        c2.InvalidateFormula();
        Debug.WriteLine($"Actual Sum Value({c2.Address}): {c2.Value.GetNumber():R17}");
        Debug.WriteLine($"Actual Sum Formatted String({c2.Address}): {c2.GetFormattedString()}");

        IXLCell d2 = worksheet.Cell("D2");
        Debug.WriteLine($"Expected Sum Value ({d2.Address}): {d2.Value.GetNumber():R17}");
        Debug.WriteLine($"Expected Sum Formatted String ({d2.Address}): {d2.GetFormattedString()}");

        IXLCell e2 = worksheet.Cell("E2");
        e2.InvalidateFormula();
        Debug.WriteLine($"Result Formatted String ({e2.Address}): {e2.GetFormattedString()}");

        IXLCell f2 = worksheet.Cell("F2");
        f2.InvalidateFormula();
        Debug.WriteLine($"Formatted Difference Formatted String ({f2.Address}): {f2.GetFormattedString()}");

        /*
         * ClosedXML 0.105.0-rc Output:
         *      Version: 0.105.0.0
         *      -13876.050000000003
         *      Element 1 Value (A2): -50000
         *      Element 1 Formatted String (A2): -50000
         *      Element 2 Value (B2): 36123.95
         *      Element 2 Formatted String (B2): 36123.95
         *      Actual Sum Value(C2): -13876.050000000003
         *      Actual Sum Formatted String(C2): -13876.05
         *      Expected Sum Value (D2): -13876.05
         *      Expected Sum Formatted String (D2): -13876.05
         *      Result Formatted String (E2): FAILURE: difference in values
         *      Formatted Difference Formatted String (F2): NEGATIVE
         * 
         * However, in Excel, cell E2 contains 'Match' and cell F2 contains 'Correct'.
         */
    }
}

Jonathan--V avatar Feb 18 '25 21:02 Jonathan--V

hi!

I'll try to explain what is happening. I can divide "issue" in 2 parts:

  1. how numbers stored in the xlsx. Please read this article to have better context. fyi: .xlsx format is not "real" file format - basically it's zipped xml files. You can rename your file to Incorrect Rounding.zip and easily check the content. let's see what is really stored in the worksheet (Incorrect Rounding.zip => xl/worksheets/sheet1.xml)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac xr xr2 xr3" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xr:uid="{D8190CE9-82ED-4879-AE58-89B2A8446850}">
	<dimension ref="A1:F2"/>
	<sheetViews><sheetView tabSelected="1" workbookViewId="0">
	<selection activeCell="F2" sqref="F2"/></sheetView>
	</sheetViews>
	<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
	<cols>
		<col min="1" max="1" width="9.42578125" bestFit="1" customWidth="1"/>
		<col min="2" max="2" width="11" bestFit="1" customWidth="1"/>
		<col min="3" max="3" width="11.42578125" bestFit="1" customWidth="1"/>
		<col min="4" max="4" width="13.140625" bestFit="1" customWidth="1"/>
		<col min="5" max="5" width="6.42578125" bestFit="1" customWidth="1"/>
		<col min="6" max="6" width="19.5703125" bestFit="1" customWidth="1"/>
	</cols>
	<sheetData>
	<row r="1" spans="1:6" x14ac:dyDescent="0.25">
		<c r="A1" t="s"><v>3</v></c><c r="B1" t="s"><v>4</v></c>
		<c r="C1" t="s"><v>0</v></c><c r="D1" t="s"><v>1</v></c>
		<c r="E1" t="s"><v>2</v></c><c r="F1" t="s"><v>5</v></c>
	</row>
	<row r="2" spans="1:6" x14ac:dyDescent="0.25">
		<c r="A2" s="1"><v>-50000</v></c>
		<c r="B2" s="1"><v>36123.949999999997</v></c>
		<c r="C2" s="1"><f>A2+B2</f><v>-13876.050000000003</v></c>
		<c r="D2" s="1"><v>-13876.05</v></c>
		<c r="E2" t="str"><f>IF(C2=D2,"Match","FAILURE: difference in values")</f><v>Match</v></c>
		<c r="F2" s="2"><f>C2-D2</f><v>0</v></c>
	</row>
	</sheetData>
	<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>

As you can see B2 = 36123.949999999997 and C2 = -13876.050000000003, and that what is returned by OpenXmlPartReader (lib uses it to parse xml). Huh, not something that you expect unless you read the article I mentioned before. Related topic can be found here https://github.com/dotnet/Open-XML-SDK/issues/1413

  1. why in the end 36123.949999999997 is 36123.95 and -13876.050000000003 is still -13876.050000000003. That is something to with .Net itself.
double.TryParse("36123.949999999997", out var num1);
double.TryParse("-13876.050000000003", out var num2);

double.TryParse("-13876.050000000009", out var num3);
double.TryParse("-13876.050000000099", out var num4);
double.TryParse("-13876.059000000003", out var num5);

This code produces next output:

36123.95
-13876.050000000003

-13876.050000000008
-13876.0500000001
-13876.059000000003

why this is happening? because of limited precision for floating numbers floating-point-numeric-types when this is happening? tbh don't know for sure - again something to do with precision which is not constant and is ~15-17 digits. Sometimes it's rounded and sometimes is not.

Related topic can be found here https://github.com/dotnet/runtime/issues/12035

What to do in this case?

  1. you can use ROUND function in your formulas if the end goal to have output file with correct display values.
  2. you can use Math.Round (or any other) if the end goal to parse input file.

P.S. Precision round value is something for you to decide based in your business logic.

vova3211 avatar Feb 22 '25 14:02 vova3211