unilm
unilm copied to clipboard
How layoutlmv3 makes train at the character level
Describe Model I am using (LayoutLMv3): The model train entities for each bbox, but my entity is a part of the text in a bbox, and there will be multiple entities in a bbox. How should I deal with this situation?
Sounds like you should look at the transformers implementation of LayoutLMv3.
https://huggingface.co/docs/transformers/main/en/model_doc/layoutlmv3#transformers.LayoutLMv3ForTokenClassification
When doing token classification it will predict a label for each token based on the token and the bboxes of each token supplied via the OCR step.
You can then come up with a step to aggregate the predicted labels to groups that suit your need.