docling icon indicating copy to clipboard operation
docling copied to clipboard

URL HTTP UnicodeDecodeError: 'utf-8' codec can't decode byte

Open dromeuf opened this issue 1 year ago • 0 comments

Bug

docling https://agoraclass.fltr.ucl.ac.be/concordances/cicero_de_diuin01/lecture/1.htm

produce error

  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 49: invalid continuation byte

But not if I download with wget, edit and save UTF8 with editor.

Steps to reproduce

...

Docling version

Docling version: 2.14.0             
Docling Core version: 2.12.1        
Docling IBM Models version: 3.1.0   
Docling Parse version: 3.0.0 

Python version

3.11

dromeuf avatar Dec 21 '24 22:12 dromeuf