docling icon indicating copy to clipboard operation
docling copied to clipboard

Disabling char normalization does not work

Open tenebrius opened this issue 2 weeks ago • 0 comments

Bug

The "一" character in Chinese is changed to "-" using DocumentConverter to convert a standard 'text' PDF to MD. No OCR.

Steps to reproduce

model = nlp_model(loglevel="debug", text_ordering=True)
model.apply_on_text("一些")
>> -些

I tried

self.model = nlp_model(loglevel="debug", text_ordering=True, normalise_chars=False, normalise_text=False)

But no success

Docling version

2.20.0

Python version

3.11

tenebrius avatar Feb 10 '25 07:02 tenebrius