doctr icon indicating copy to clipboard operation
doctr copied to clipboard

[docs] Added a section about artefact detection (barcode / faces)

Open frgfm opened this issue 3 years ago • 3 comments

Following up on #968, I just realized that there is no section for artefact detection in the documentation. While the object detection module for this is only available in PyTorch for now, I'd argue we could document the OpenCV wrap we made a long time ago!

Here is the section to document: https://github.com/mindee/doctr/tree/main/doctr/models/artefacts

Here is what I propose:

  • [ ] Added example in the docstrings of those objects
  • [ ] Add a section in https://github.com/mindee/doctr/blob/main/docs/source/modules/models.rst for this

frgfm avatar Jul 01 '22 16:07 frgfm

@odulcy-mindee @frgfm Not sure if we still need this !? For such additional features (which have nothing to do with OCR) I think a contribution module would be better, which should be backend independent, i.e. ONNX and the loading of the models either using opencv or the onnxruntime.

felixdittrich92 avatar Feb 09 '24 07:02 felixdittrich92

@felixdittrich92 Do you have something in mind for the contribution module ? Do you want to move the actual artefacts module somewhere else ?

odulcy-mindee avatar Feb 13 '24 10:02 odulcy-mindee

@odulcy-mindee Yeah there is a lot of space for such additonal features like object detection (maybe zero shot) instead of the current Faster-rcnn which would require some more data, Document Unskewing like DocTr++ (https://github.com/fh2019ustc/DocTr-Plus).

The only requirement i see is that it should be backend independent, keeped as extra for example pip install python-doct[torch + extra] (where extra in this case would be only onnxuntime` atm.) and easily integratable into the current pipeline.

For example:

from doctr.io import DocumentFile
from doctr.contribution import ArtefactDetector, ...
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, extensions=[])
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")

But yeah that's something we could tackle if the other things are resolved :sweat_smile:

felixdittrich92 avatar Feb 13 '24 11:02 felixdittrich92

Done with contrib module

felixdittrich92 avatar Apr 25 '24 16:04 felixdittrich92