docling
docling copied to clipboard
Disabling char normalization does not work
Bug
The "一" character in Chinese is changed to "-" using DocumentConverter to convert a standard 'text' PDF to MD. No OCR.
Steps to reproduce
model = nlp_model(loglevel="debug", text_ordering=True)
model.apply_on_text("一些")
>> -些
I tried
self.model = nlp_model(loglevel="debug", text_ordering=True, normalise_chars=False, normalise_text=False)
But no success
Docling version
2.20.0
Python version
3.11