unilm
unilm copied to clipboard
LayoutReader: how to get left to right and top to bottom bboxes from OCR results such as Tesseract
Describe Model I am using (LayoutReader): thanks for your work, I am trying to do inference using your pretrained model and OCR result from Tesseract, but I meet some problems:
- the input bboxes should be left to right and top to bottom, how to do this using bboxes coordinates
- I am trying to sort bboxes using left top coordinates(x, y) like
bboxes = sorted(bboxes, lambda x: [x[0], x[1]])
, but I find the results is much worse than use bboxes order from Tesseract directly
@animebing Which dataset are you using?
@animebing Which dataset are you using?
I don't use any dataset, I just use Tesseract to get OCR result from an image and try to use the provided pretrained model to do inference
@animebing, LayoutReader is designed for training models using your own datasets or ReadingBank. It is better to provide labeled data in your domain and train a model for your applications.
@wolfshow you mean left-right-top-bottom is customized for each dataset? but if I want to evaluate the performance of the pretrained model in a real world image using Tesseract to get OCR result, how should I organize bboxes to meet left-right-top-bottom of the pretrained model?
LayoutReader is trained with left-to-right and top-to-bottom order so it is recommended to follow a similar setting in the real application. I assume the difference you mentioned is probably because the bboxes are not aligned well in real applications, which means, for example, there might be a very small difference in the y-axis even if two words are in the same line.
And, I assume the bboxes are expressed as [x0, y0, x1, y1] so if you sort the words using bboxes = sorted(bboxes, lambda x: [x[0], x[1]])
, you are actually sorting the words in a column-first order.
So please double check you code and try some soft mechanism when sorting the bboxes. Then see the result.
LayoutReader is trained with left-to-right and top-to-bottom order so it is recommended to follow a similar setting in the real application. I assume the difference you mentioned is probably because the bboxes are not aligned well in real applications, which means, for example, there might be a very small difference in the y-axis even if two words are in the same line.
And, I assume the bboxes are expressed as [x0, y0, x1, y1] so if you sort the words using
bboxes = sorted(bboxes, lambda x: [x[0], x[1]])
, you are actually sorting the words in a column-first order.So please double check you code and try some soft mechanism when sorting the bboxes. Then see the result.
请问会开源构建数据集时候用到的代码吗?