azure-search-openai-demo
azure-search-openai-demo copied to clipboard
Process fails for certain pdf document but not others, not sure why
Please provide us with the following information:
This issue is for a: (mark with an x)
- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
I included some custom pdfs in my data folder instead of the default Northwind Traders, and all worked fine except for this one: [https://www.faa.gov/sites/faa.gov/files/regulations_policies/handbooks_manuals/aviation/airplane_handbook/00_afh_full.pdf] That generated roughly 200 pages and uploaded the first 9 (0-8), but when uploading blob #9, it would fail. The pdf seems to have a blank page there with only a page number, but if I modified the script to start from page 10 it failed there also.
Any log messages given by the failure
String of errors, most recent was line 312 in prepdocs.py, earlier were azure\core\pipeline\transport
Expected/desired behavior
OS and Version?
Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) Windows 11
Versions
Ran this on 4/17/2023. 5-6 other pdfs of similar origin worked without error, but that one did not. Curious if anyone can figure out why and what's different about that file.
Mention any other details that might be useful
Thanks! We'll be in touch soon.
I have stumbled upon a similar issue. In my case I was able to solve the issue by updating the related pypdf (pypdf==3.7.1) dependency.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.