the fine-tuning of language in the content section
Unitable is a powerful recognition tool, but I want to train table content recognition that supports other languages. Have any good suggestions or opinions?
I would suggest finetuning the OCR branch with the targeted language and UniTable should work out-of-the-box.
In the recognition of the bbox section, there may be a large amount of text or gaps in a single bbox, which can lead to content loss or misalignment. Do you have any good suggestions for this situation? What model or debugging method is called for pre training or fine-tuning of unitable_1arge_bbox.pt?
I would suggest finetuning the OCR branch with the targeted language and UniTable should work out-of-the-box.
请问您在微调的时候更改了哪些东西,有更改tokenizer嘛