azure-search-openai-demo icon indicating copy to clipboard operation
azure-search-openai-demo copied to clipboard

How to disable OCR in prepdocs script?

Open egor-yudkin opened this issue 5 months ago • 1 comments

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

  1. Run prepdocs.sh script on some PDF files that contain images
  2. Text from images embedded gets indexed

Any log messages given by the failure

n/a

Expected/desired behavior

I'd like to have a way to disable OCR of the images embedded in PDF files. Our use case is the application and training documentation that includes screenshots of application screens with random/example data displayed and we don't want it to be in the index.

OS and Version?

Linux Ubuntu

Versions

2024-08-23

egor-yudkin avatar Sep 06 '24 15:09 egor-yudkin