table-transformer how to approach model distillation, for creating a smaller + faster model

I am interested a implementation of model knowledge distillation for this specific model. This technique will allows us to transfer the valuable knowledge and performance of a larger, resource-intensive model (the "teacher") to a smaller, more lightweight counterpart (the "student").

Any inputs from the community on this will be really helpful. How should I approach this problem?

PS- I got this idea from PaddleStructure v2, where they used FGD [Focal and Global Knowledge Distillation for Detectors] - for model size reduction ; source; https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/ppstructure/docs/models_list_en.md

Sep 28 '23 05:09 mllife

I got the same idea today haha, let me know if you already implemented this, @mllife .

Nov 19 '24 16:11 dimitri009

Hello, @dimitri009 ; You can train your own custom object detection model that is faster rt-detr or the newer yolo v11 https://docs.ultralytics.com/models/rtdetr/ ; Personally, I have now moved to TableFormer https://github.com/DS4SD/docling-ibm-models, which is have a light and a fat version. You can use it based on your preference.

Nov 20 '24 05:11 mllife