unilm icon indicating copy to clipboard operation
unilm copied to clipboard

Adding multiple classification heads to train in single model

Open Atul997 opened this issue 2 years ago • 6 comments

I want to train a model based on the same architecture but two different classification heads where one would detect layout of documents as table, text, title, figure etc and other would detect cells inside table. Right now I have two different models for layout and table-cell based on the same architecture. Since I have used the same architecture for both different use-cases. How can I train one single model with combination of layout and cells inside table?

NOTE: I already used OCR coordinates of text inside tables but results were not good enough,so don't want to use it.

Atul997 avatar Jun 21 '22 09:06 Atul997

@Atul997 First question, what about the accuracy of the two different models? Are they good enough?

wolfshow avatar Jun 22 '22 03:06 wolfshow

@wolfshow yes the accuracy of both models are good enough.

Atul997 avatar Jun 22 '22 04:06 Atul997

What's the backbone network of these two models?

wolfshow avatar Jun 22 '22 06:06 wolfshow

VIT backbone for both the models given in publaynet and icdar configs.

Atul997 avatar Jun 22 '22 06:06 Atul997

I think these two models are a little incompatible with each other as the DiT for publaynet aims to detect large objects while the other model for table cells is trying to locate small objects. You may try the heads from the text-side as well using LayoutLM.

wolfshow avatar Jun 22 '22 06:06 wolfshow

Well I can try but I don't want to include text in training just want to try with images only. Also if possible I can go with any one of the configurations based on the performance.

Atul997 avatar Jun 22 '22 06:06 Atul997