Bug in Excel extraction truncates digits when last 3 before decimal are identical
Bug
Hello, I encountered a bug when importing data from Excel files. Specifically, if the number being extracted has the last three digits before the decimal point identical, one of those digits is being incorrectly truncated during the import process.
Examples: 24000 is extracted as 2400 24333 is extracted as 2433 24565 remains correctly as 24565
It seems the logic that handles number formatting or string conversion might be mistakenly collapsing repeated digits in certain conditions.
Steps to reproduce
- Create an Excel file with numeric values like 24000, 24333, and 24565.
- Import the file using Docling.
- Observe how the values are misrepresented in the resulting data.
Docling version
1.16.0
Code (in typescript):
const formData = new FormData();
formData.append("files", file);
formData.append("ocr_engine", "easyocr");
formData.append("pdf_backend", "dlparse_v2");
formData.append("from_formats", "pdf");
formData.append("from_formats", "docx");
formData.append("from_formats", "xlsx");
formData.append("from_formats", "pptx");
formData.append("from_formats", "image");
formData.append("image_export_mode", "placeholder");
formData.append("ocr_lang", "en");
formData.append("ocr_lang", "fr");
formData.append("table_mode", "fast");
formData.append("abort_on_error", "false");
formData.append("to_formats", "md");
formData.append("return_as_file", "false");
formData.append("do_ocr", enableOcr ? "true" : "false");
formData.append("force_ocr", "false");
const doclingResponse = await fetch(doclingUrl, {
method: "POST",
body: formData,
});
Thanks @ayoub-louati for reporting this issue. I was not able to reproduce the error following the steps that you described. The numbers appear correctly parsed. Could you please:
- Update the Docling library to the latest release
- Share the output of executing
docling --version - Share an Excel file that triggers the error you described
Thanks for your response