[docs] Added a section about artefact detection (barcode / faces)
Following up on #968, I just realized that there is no section for artefact detection in the documentation. While the object detection module for this is only available in PyTorch for now, I'd argue we could document the OpenCV wrap we made a long time ago!
Here is the section to document: https://github.com/mindee/doctr/tree/main/doctr/models/artefacts
Here is what I propose:
- [ ] Added example in the docstrings of those objects
- [ ] Add a section in https://github.com/mindee/doctr/blob/main/docs/source/modules/models.rst for this
@odulcy-mindee @frgfm Not sure if we still need this !? For such additional features (which have nothing to do with OCR) I think a contribution module would be better, which should be backend independent, i.e. ONNX and the loading of the models either using opencv or the onnxruntime.
@felixdittrich92 Do you have something in mind for the contribution module ? Do you want to move the actual artefacts module somewhere else ?
@odulcy-mindee Yeah there is a lot of space for such additonal features like object detection (maybe zero shot) instead of the current Faster-rcnn which would require some more data, Document Unskewing like DocTr++ (https://github.com/fh2019ustc/DocTr-Plus).
The only requirement i see is that it should be backend independent, keeped as extra for example pip install python-doct[torch + extra] (where extra in this case would be only onnxuntime` atm.) and easily integratable into the current pipeline.
For example:
from doctr.io import DocumentFile
from doctr.contribution import ArtefactDetector, ...
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True, extensions=[])
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
But yeah that's something we could tackle if the other things are resolved :sweat_smile:
Done with contrib module