how to approach model distillation, for creating a smaller + faster model
I am interested a implementation of model knowledge distillation for this specific model. This technique will allows us to transfer the valuable knowledge and performance of a larger, resource-intensive model (the "teacher") to a smaller, more lightweight counterpart (the "student").
Any inputs from the community on this will be really helpful. How should I approach this problem?
PS- I got this idea from PaddleStructure v2, where they used FGD [Focal and Global Knowledge Distillation for Detectors] - for model size reduction ; source; https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/ppstructure/docs/models_list_en.md
I got the same idea today haha, let me know if you already implemented this, @mllife .
Hello, @dimitri009 ; You can train your own custom object detection model that is faster rt-detr or the newer yolo v11 https://docs.ultralytics.com/models/rtdetr/ ; Personally, I have now moved to TableFormer https://github.com/DS4SD/docling-ibm-models, which is have a light and a fat version. You can use it based on your preference.