unilm icon indicating copy to clipboard operation
unilm copied to clipboard

How to do object detection in a web page?

Open nashid opened this issue 3 years ago • 3 comments

Given a screenshot of web page, can it understand web elements in that screenshot. Web elements could be:

  • A Table
  • A Drop Down Menu
  • A Numbered List
  • A Bulleted List
  • A Radio Button
  • ....

Can we use these models detect web layout?

nashid avatar Jul 01 '22 17:07 nashid

This depends on the labeled data you use. If you only have labeled screenshots, you may fine-tune the DiT model for object detection. Or, if you have webpage source code, then MarkupLM is the model that you may need to try.

wolfshow avatar Jul 02 '22 03:07 wolfshow

@wolfshow I have access to screenshots, HTML, and DOM.

Wondering do have examples of fine-tuning steps for these models?

nashid avatar Jul 04 '22 00:07 nashid

Hi, I have the same question about the fine-tuning of DIT. And could you specify the GPU requirements?

panichevad avatar Jul 08 '22 07:07 panichevad