azure-search-openai-demo
azure-search-openai-demo copied to clipboard
GPT-4-vision approach does not support non-PDF document formats
Our prepdocs code currently only uploads image versions of PDF documents, per this line of code:
if self.store_page_images and os.path.splitext(file.content.name)[1].lower() == ".pdf":
return await self.upload_pdf_blob_images(service_client, container_client, file)
That function then uses a local PDF reader library to turn the PDF page into an image, and writes the citation on it with Pillow.
I'm not sure if we can reasonably upload images of all the MS document formats, but presumably we could support all the image formats using Pillow.
Please star this issue if you're affected by this.