unilm Adding multiple classification heads to train in single model

Adding multiple classification heads to train in single model

Open Atul997 opened this issue 2 years ago • 6 comments

I want to train a model based on the same architecture but two different classification heads where one would detect layout of documents as table, text, title, figure etc and other would detect cells inside table. Right now I have two different models for layout and table-cell based on the same architecture. Since I have used the same architecture for both different use-cases. How can I train one single model with combination of layout and cells inside table?

NOTE: I already used OCR coordinates of text inside tables but results were not good enough,so don't want to use it.

Jun 21 '22 09:06 Atul997

@Atul997 First question, what about the accuracy of the two different models? Are they good enough?

Jun 22 '22 03:06 wolfshow

@wolfshow yes the accuracy of both models are good enough.

Jun 22 '22 04:06 Atul997

What's the backbone network of these two models?

Jun 22 '22 06:06 wolfshow

VIT backbone for both the models given in publaynet and icdar configs.

Jun 22 '22 06:06 Atul997

I think these two models are a little incompatible with each other as the DiT for publaynet aims to detect large objects while the other model for table cells is trying to locate small objects. You may try the heads from the text-side as well using LayoutLM.

Jun 22 '22 06:06 wolfshow

Well I can try but I don't want to include text in training just want to try with images only. Also if possible I can go with any one of the configurations based on the performance.

Jun 22 '22 06:06 Atul997

unilm unilm copied to clipboard

Adding multiple classification heads to train in single model

unilm
unilm copied to clipboard