docling icon indicating copy to clipboard operation
docling copied to clipboard

UnicodeEncodeError: 'charmap' codec can't encode character '\u015b' in position 895: character maps to <undefined>

Open giuliastro opened this issue 1 year ago • 2 comments

Hello, I get this error when using Docling. I also added version and command line parameters. Thank you in advance.

Bug

Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in run_code File "C:\Software\Docling\venv\Scripts\docling.exe_main.py", line 7, in File "C:\Software\Docling\venv\Lib\site-packages\typer\main.py", line 338, in call raise e File "C:\Software\Docling\venv\Lib\site-packages\typer\main.py", line 321, in call return get_command(self)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Software\Docling\venv\Lib\site-packages\click\core.py", line 1157, in call return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Software\Docling\venv\Lib\site-packages\typer\core.py", line 665, in main return _main( ^^^^^^ File "C:\Software\Docling\venv\Lib\site-packages\typer\core.py", line 197, in _main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "C:\Software\Docling\venv\Lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Software\Docling\venv\Lib\site-packages\click\core.py", line 783, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Software\Docling\venv\Lib\site-packages\typer\main.py", line 703, in wrapper return callback(**use_params) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Software\Docling\venv\Lib\site-packages\docling\cli\main.py", line 389, in convert export_documents( File "C:\Software\Docling\venv\Lib\site-packages\docling\cli\main.py", line 112, in export_documents conv_res.document.save_as_markdown( File "C:\Software\Docling\venv\Lib\site-packages\docling_core\types\doc\document.py", line 1942, in save_as_markdown fw.write(md_out) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.2288.0_x64__qbz5n2kfra8p0\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u015b' in position 895: character maps to

Steps to reproduce

docling -v --to text --image-export-mode placeholder --ocr --ocr-lang it,en .\ERM.pdf ...

Docling version

2.11 ...

giuliastro avatar Dec 12 '24 09:12 giuliastro

@giuliastro Could you please provide us a sample PDF which causes this problem? We need one to investigate this problem.

cau-git avatar Dec 18 '24 10:12 cau-git

Possibly related to https://github.com/DS4SD/docling/issues/598

cau-git avatar Dec 18 '24 10:12 cau-git