docling icon indicating copy to clipboard operation
docling copied to clipboard

Bug in Excel extraction truncates digits when last 3 before decimal are identical

Open ayoub-louati opened this issue 9 months ago • 1 comments

Bug

Hello, I encountered a bug when importing data from Excel files. Specifically, if the number being extracted has the last three digits before the decimal point identical, one of those digits is being incorrectly truncated during the import process.

Examples: 24000 is extracted as 2400 24333 is extracted as 2433 24565 remains correctly as 24565

It seems the logic that handles number formatting or string conversion might be mistakenly collapsing repeated digits in certain conditions.

Steps to reproduce

  1. Create an Excel file with numeric values like 24000, 24333, and 24565.
  2. Import the file using Docling.
  3. Observe how the values are misrepresented in the resulting data.

Docling version

1.16.0

Code (in typescript):

    const formData = new FormData();
    formData.append("files", file);
    formData.append("ocr_engine", "easyocr");
    formData.append("pdf_backend", "dlparse_v2");
    formData.append("from_formats", "pdf");
    formData.append("from_formats", "docx");
    formData.append("from_formats", "xlsx");
    formData.append("from_formats", "pptx");
    formData.append("from_formats", "image");
    formData.append("image_export_mode", "placeholder");
    formData.append("ocr_lang", "en");
    formData.append("ocr_lang", "fr");
    formData.append("table_mode", "fast");
    formData.append("abort_on_error", "false");
    formData.append("to_formats", "md");
    formData.append("return_as_file", "false");
    formData.append("do_ocr", enableOcr ? "true" : "false");
    formData.append("force_ocr", "false");

    const doclingResponse = await fetch(doclingUrl, {
      method: "POST",
      body: formData,
    });

ayoub-louati avatar Apr 14 '25 15:04 ayoub-louati

Thanks @ayoub-louati for reporting this issue. I was not able to reproduce the error following the steps that you described. The numbers appear correctly parsed. Could you please:

  • Update the Docling library to the latest release
  • Share the output of executing docling --version
  • Share an Excel file that triggers the error you described

ceberam avatar May 28 '25 18:05 ceberam

Thanks for your response

ayoub-louati avatar Jun 11 '25 12:06 ayoub-louati