fastexcel icon indicating copy to clipboard operation
fastexcel copied to clipboard

fasterreader excel converting 1.1 cell value to 1.1000000000000001 but reading other values perfectly

Open naveensingh10 opened this issue 5 years ago • 8 comments

naveensingh10 avatar Aug 22 '19 16:08 naveensingh10

How do you read cell values? Please post a code snippet.

ochedru avatar Sep 02 '19 17:09 ochedru

Looks like a standard floating point proglem

rzymek avatar Sep 02 '19 18:09 rzymek

Hello, I just stumbled on this bug and it is quite annoying. I don't think that it can be closed with "standard floating point problem" answer although you are partially right that it is related. This issue should not happen if you use BigDecimal correctly. I did not have a chance to look into your code yet but I suspect that there is a double used somewhere in the process of calculation of the final value. Perhaps simply the BigDecimal is not correctly constructed?

Check the following outputs: System.out.println(new BigDecimal("1.1")); --> prints 1.1 System.out.println(BigDecimal.valueOf(1.1)); --> prints 1.1 System.out.println(new BigDecimal(1.1)); --> prints 1.100000000000000088817841970012523233890533447265625

I guess that the problem is somewhere here.

I would actually consider this a high priority issue since it really changes the data and the library is not usable while there is this bug present...

I hope we can solve this issue.

Thank you for the nice work on the library.

Palinoo avatar Oct 23 '20 16:10 Palinoo

You're right. I was too quick to dismiss it as floating point issue as fastexcel's API exposes BigDecimal not a double. I did check the code just now and BigDecimal is properly constructed from String: https://github.com/dhatim/fastexcel/blob/fef8a7ac5b2a2ae0ad9d98198bb5bb81a957c7ca/fastexcel-reader/src/main/java/org/dhatim/fastexcel/reader/RowSpliterator.java#L224

The only double usage in in date handling methods. There's no float anywhere in the fastexcel-reader.

Could you share your xlsx file and some code that reproduces the problem?

rzymek avatar Oct 23 '20 20:10 rzymek

The bug can be reproduced with a simple excel file which contains a number 1.1 in a cell. E.g. the following test.xlsx I am reading an excel file which has actually a lot of numbers that are read incorrectly (just so that it doesn't look like this is the only number that is read incorrectly)

You can check it with a simple code like this one:

    public static void main(String[] args) throws IOException {
        try (InputStream is = new FileInputStream("test.xlsx");
             ReadableWorkbook wb = new ReadableWorkbook(is)) {
            Sheet sheet = wb.getFirstSheet();
            try (Stream<Row> rows = sheet.openStream()) {
                rows.forEach(r -> {
                    r.forEach(cell -> {
                        assertEquals(BigDecimal.valueOf(1.1), cell.asNumber());
                    });
                });
            }
        }
    }

I tried to debug it a bit and went down to fasterxml TextBuffer where the number is already read in a wrong format. Then I wondered if the bug doesn't have its roots directly in how excel stores numbers and this seems to be the root of the problem. According to this https://en.wikipedia.org/wiki/Numeric_precision_in_Microsoft_Excel excel stores numbers in double-precision floating-point format https://en.wikipedia.org/wiki/Double-precision_floating-point_format which is the format that is used by Java double. So it looks like you are trying to make the read value unnecessarily precise by using BigDecimal. Using a simple double is just enough, you will not get more precision by using a BigDecimal. Apache poi reads numbers as double as well.

I am not sure how is the formula calculation implemented - is this calculated by your library? If yes, then you could perhaps get a higher precision by using BigDecimal during these calculations but I would say that double should be just fine for almost all users.

I am actually not sure why the parsers read the number as such a weird string. The following should however work just fine to convert the string back to double

Double.valueOf(s);

I am for the moment going to fix the bug in my code by reading the numbers as following:

BigDecimal.valueOf(cell.asNumber().doubleValue());

I need to read it like this because I need it as a BigDecimal but of course for most people do not need to do the conversion to BigDecimal. This also explains why there were not that many complaints about this issue - most people simply convert the number to double and never notice that the BigDecimal wasn't correctly read.

Let me know what you think about it.

Thank you

Palinoo avatar Oct 24 '20 20:10 Palinoo

The actual value in test.xlsx is 1.1000000000000001:

<sheetData>
  <row r="1" spans="1:1" x14ac:dyDescent="0.2">
    <c r="A1" s="1">
      <v>1.1000000000000001</v>
    </c>
  </row>
</sheetData>

This test passes:

try (InputStream is = Issue92.class.getResourceAsStream("/test.xlsx");
     ReadableWorkbook wb = new ReadableWorkbook(is)) {
  Sheet sheet = wb.getFirstSheet();
  try (Stream<Row> rows = sheet.openStream()) {
    rows.forEach(r -> {
      r.forEach(cell -> {
        assertEquals("1.1000000000000001", cell.getText());
        assertEquals(new BigDecimal("1.1000000000000001"), cell.asNumber());
        assertEquals(1.1000000000000001, cell.asNumber().doubleValue());
        assertEquals(1.1, cell.asNumber().doubleValue());
        assertEquals(1.1, 1.1000000000000001);
      });
    });
  }
}

This meas 1.1000000000000001 is not representable precisely in a double. The closest double representation is 1.1.

Looks like fastexcel-reader is more precise then Excel and LibreOffice, which I admit can be treated as an issue.

rzymek avatar Oct 31 '20 11:10 rzymek

Hello, sorry for not coming earlier but here is my view:

The test that I expect to pass is really simple:

  1. As a user I use Microsoft Excel to create an excel sheet. I enter there 1.1
  2. I send this file to an application which uses fast excel and expect that the application reads 1.1

This is a really simple test that I expect should pass without any questions. Unfortunately this test doesn't pass. Now we may debate about if excel stores the number that I entered as 1.1000000000000001 but that discussion is pointless in my opinion. As a user I am not going to extract the xls file to an xml and edit the values in xml. I am using the program from Microsoft to edit the file. Of course we can debate as well about how Microsoft stores the numbers, perhaps it is not ideal, however the Microsoft product is widely used and we cannot change that.

I understand your point that you are dealing with the excel file in xml format. However users don't read or write the file that way. The fact is that the simple user test above fails and I understand that perhaps some other test that involves reading the data from xml passes but that test is irrelevant.

So in my opinion this is a really serious bug that needs to be fix.

Please try the simple test yourself. Try to create an excel file yourself and check if what you store in the file (using Microsoft Excel) is the same that fast excel reads.

Please let me know if I am doing something wrong, although I do not see much space for an error here since I am using Excel as an user like anybody else would. The only possibility could be that my version of Excel is buggy but that is why it would best if you could test this as well.

Thank you

Palinoo avatar Nov 22 '20 14:11 Palinoo