How to do object detection in a web page?
Given a screenshot of web page, can it understand web elements in that screenshot. Web elements could be:
- A Table
- A Drop Down Menu
- A Numbered List
- A Bulleted List
- A Radio Button
- ....
Can we use these models detect web layout?
This depends on the labeled data you use. If you only have labeled screenshots, you may fine-tune the DiT model for object detection. Or, if you have webpage source code, then MarkupLM is the model that you may need to try.
@wolfshow I have access to screenshots, HTML, and DOM.
Wondering do have examples of fine-tuning steps for these models?
Hi, I have the same question about the fine-tuning of DIT. And could you specify the GPU requirements?