importer
importer copied to clipboard
[Q] OCR for proper images only
Hello Pascal, is it possible to configure the Document Parser to apply the OCR processing for images from a given size / dimention? There are some metadata that could be checked:
tiff:ImageLength = [756] // pixels
Content-Length = [93191] //bytes
Thanks!
Unfortunately, out-of-the-box, OCR is only supported as part of parsing a file and it is typically while parsing that the image size is known. I am marking this as a feature request to allow specifying minimum/maximum dimensions for OCR.
In the meantime, you can look at creating your own parser or use an ExternalTransformer to perform OCR if applicable.
If you do not want to keep images that are not eligible for OCR, you could also write a IDocumentFilter that extracts the image dimensions and rejects those not matching what you want.
Thank you!