unitable icon indicating copy to clipboard operation
unitable copied to clipboard

the fine-tuning of language in the content section

Open num3num opened this issue 1 year ago • 3 comments

Unitable is a powerful recognition tool, but I want to train table content recognition that supports other languages. Have any good suggestions or opinions?

num3num avatar Jul 18 '24 10:07 num3num

I would suggest finetuning the OCR branch with the targeted language and UniTable should work out-of-the-box.

ShengYun-Peng avatar Jul 22 '24 14:07 ShengYun-Peng

In the recognition of the bbox section, there may be a large amount of text or gaps in a single bbox, which can lead to content loss or misalignment. Do you have any good suggestions for this situation? What model or debugging method is called for pre training or fine-tuning of unitable_1arge_bbox.pt?

num3num avatar Jul 29 '24 07:07 num3num

I would suggest finetuning the OCR branch with the targeted language and UniTable should work out-of-the-box.

请问您在微调的时候更改了哪些东西,有更改tokenizer嘛

Anananana1568 avatar Feb 01 '25 13:02 Anananana1568