doctr icon indicating copy to clipboard operation
doctr copied to clipboard

[models] Pretrained artefact detection model isn't that robust

Open fg-mindee opened this issue 2 years ago • 2 comments

The current pretrained artefact detection model was trained on a fully synthetic dataset. While this comes with several advantages, the dataset has a distribution that is still a bit far off from real-world data.

This is almost a product question but users can either use this on:

  • source PDF/documents
  • scanned documents
  • pictures of documents

While the model performs quite well on the first two usecases, the last one has some precision troubles sometimes (especially with the background):

image

To tackle this, perhaps we should improve the dataset or add nice augmentations that adds backgrounds instead of zero padding for geometric transforms (rotation, perspective, etc.)

What do you think @SiddhantBahuguna?

cc @fharper

fg-mindee avatar Mar 17 '22 16:03 fg-mindee

Thanks for the issue FG :) I agree. More dataset for further fine tuning will greatly help :) One more thing, in addition, may be we can set some geometric restrictions (aspect ratio and relative area of the logo in particular with respect to the entire page). I had implemented that along with post-NMS and it did improve precision ( I am sorry, I dnt remember the perf increase exactly). So, to start from somewhere, I suggest, we have a dataset of around 2k more images for fine tuning ? That dataset may not be accessible to public because of restrictions though. In the meantime, I will work on the padding :) Will update you in 1~2 weeks on the same. Thanks!

SiddhantBahuguna avatar Mar 17 '22 16:03 SiddhantBahuguna

@SiddhantBahuguna any update ? :)

felixdittrich92 avatar May 24 '22 18:05 felixdittrich92

On user end with contrib module now

felixdittrich92 avatar Apr 25 '24 16:04 felixdittrich92