importer icon indicating copy to clipboard operation
importer copied to clipboard

[Q] OCR for proper images only

Open jetnet opened this issue 6 years ago • 2 comments

Hello Pascal, is it possible to configure the Document Parser to apply the OCR processing for images from a given size / dimention? There are some metadata that could be checked:

tiff:ImageLength = [756] // pixels
Content-Length = [93191] //bytes

Thanks!

jetnet avatar May 17 '19 11:05 jetnet

Unfortunately, out-of-the-box, OCR is only supported as part of parsing a file and it is typically while parsing that the image size is known. I am marking this as a feature request to allow specifying minimum/maximum dimensions for OCR.

In the meantime, you can look at creating your own parser or use an ExternalTransformer to perform OCR if applicable.

If you do not want to keep images that are not eligible for OCR, you could also write a IDocumentFilter that extracts the image dimensions and rejects those not matching what you want.

essiembre avatar May 27 '19 02:05 essiembre

Thank you!

jetnet avatar Jun 13 '19 17:06 jetnet