docling Disabling char normalization does not work

Disabling char normalization does not work

Open tenebrius opened this issue 2 weeks ago • 0 comments

The "一" character in Chinese is changed to "-" using DocumentConverter to convert a standard 'text' PDF to MD. No OCR.

model = nlp_model(loglevel="debug", text_ordering=True)
model.apply_on_text("一些")
>> -些

I tried

self.model = nlp_model(loglevel="debug", text_ordering=True, normalise_chars=False, normalise_text=False)

But no success

2.20.0

3.11

Feb 10 '25 07:02 tenebrius