azure-search-openai-demo
azure-search-openai-demo copied to clipboard
How to disable OCR in prepdocs script?
This issue is for a: (mark with an x
)
- [ ] bug report -> please search issues before submitting
- [x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
- Run
prepdocs.sh
script on some PDF files that contain images - Text from images embedded gets indexed
Any log messages given by the failure
n/a
Expected/desired behavior
I'd like to have a way to disable OCR of the images embedded in PDF files. Our use case is the application and training documentation that includes screenshots of application screens with random/example data displayed and we don't want it to be in the index.
OS and Version?
Linux Ubuntu
Versions
2024-08-23