unilm
unilm copied to clipboard
Adding multiple classification heads to train in single model
I want to train a model based on the same architecture but two different classification heads where one would detect layout of documents as table, text, title, figure etc and other would detect cells inside table. Right now I have two different models for layout and table-cell based on the same architecture. Since I have used the same architecture for both different use-cases. How can I train one single model with combination of layout and cells inside table?
NOTE: I already used OCR coordinates of text inside tables but results were not good enough,so don't want to use it.
@Atul997 First question, what about the accuracy of the two different models? Are they good enough?
@wolfshow yes the accuracy of both models are good enough.
What's the backbone network of these two models?
VIT backbone for both the models given in publaynet and icdar configs.
I think these two models are a little incompatible with each other as the DiT for publaynet aims to detect large objects while the other model for table cells is trying to locate small objects. You may try the heads from the text-side as well using LayoutLM.
Well I can try but I don't want to include text in training just want to try with images only. Also if possible I can go with any one of the configurations based on the performance.