doctr [docs] Added a section about artefact detection (barcode / faces)

Following up on #968, I just realized that there is no section for artefact detection in the documentation. While the object detection module for this is only available in PyTorch for now, I'd argue we could document the OpenCV wrap we made a long time ago!

Here is the section to document: https://github.com/mindee/doctr/tree/main/doctr/models/artefacts

Here is what I propose:

[ ] Added example in the docstrings of those objects
[ ] Add a section in https://github.com/mindee/doctr/blob/main/docs/source/modules/models.rst for this

Jul 01 '22 16:07 frgfm

@odulcy-mindee @frgfm Not sure if we still need this !? For such additional features (which have nothing to do with OCR) I think a contribution module would be better, which should be backend independent, i.e. ONNX and the loading of the models either using opencv or the onnxruntime.

Feb 09 '24 07:02 felixdittrich92

@felixdittrich92 Do you have something in mind for the contribution module ? Do you want to move the actual artefacts module somewhere else ?

Feb 13 '24 10:02 odulcy-mindee

@odulcy-mindee Yeah there is a lot of space for such additonal features like object detection (maybe zero shot) instead of the current Faster-rcnn which would require some more data, Document Unskewing like DocTr++ (https://github.com/fh2019ustc/DocTr-Plus).

The only requirement i see is that it should be backend independent, keeped as extra for example pip install python-doct[torch + extra] (where extra in this case would be only onnxuntime` atm.) and easily integratable into the current pipeline.

For example:

from doctr.io import DocumentFile
from doctr.contribution import ArtefactDetector, ...
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, extensions=[])
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")

But yeah that's something we could tackle if the other things are resolved :sweat_smile:

Feb 13 '24 11:02 felixdittrich92

Done with contrib module

Apr 25 '24 16:04 felixdittrich92