Include GPT-4 V model to be able to search for images and embedding images.
Motivation
Company data often comprises various types of images, including screenshots, maps, and diagrams. By enabling the chat admin app to ingest and process these images, it can provide more accurate and relevant responses to user queries that involve visual data. This ensures that the chat app can fully utilise all available company data to deliver an improved user experience.
Note: Image processing is only available using GPT-4 https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-preview
How would you feel if this feature request was implemented?

Requirements
- Ensure existing application works correctly with GPT-4
- Allow images to be uploaded via the Admin application
- When "Reprocess all" is click via the Admin app, reprocess the images
- When a question is asked, image data should be searched and passed to
gpt-4-visionto generate a response - Citations should link to the image stored in blob storage
- ~Stretch: Fallback to OCR/document intelligence if image of a document detected~
- ~Stretch: Allow images to be uploaded when chatting~
Tasks
- [x] #713
- [x] #715
- [x] #728
- [x] #748
- [x] #749
- [x] #750
- [x] #752
- [x] #965
- [ ] #964
- [ ] #993
- [ ] Investigate it's possible to turn advanced image processing on/off without reprovisioning
- [ ] Allow images to be uploaded via pull model (if possible, this indicates it may not be - https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/gpt4v.md#setup-and-usage)
Bugs
- [x] #929
Reference here: - https://github.com/Azure-Samples/azure-search-openai-demo/pull/1056
Update 22nd April:
After spiking possible technology choices, I believe the best way forward is to:
- Use Azure Computer Vision to generate embeddings of the image
- Use
GPT-4-visionto generate a description of the image andtext-embeddings-ada-002embed the description - Store both embedding vectors in the Azure AI Search index
Then when querying, generate embeddings of the question using both Azure Computer Vision and text-embeddings-ada-002.
Note: this does require us to change the index to allow for an additional imageEmbeddings field.
I was initially going to create an ADR deciding on which tools would be best to use, but given my research, spike and investigation on how this is implemented in https://github.com/Azure-Samples/azure-search-openai-demo/pull/1056, I now believe using both appoaches combined will give the best results.
Next steps are to now start building this into CWYDSA
Update 23rd April:
- The computer vision and gpt-4-vision model deployment resources are now being provisioned
- This is applied if
USE_GPT4_VISION=true - Unfortunately,
gpt-4-visiondoes not support function calling, so this is an additional deployment alongside another model - Next steps are to allow images to be uploaded via the admin app
- It looks like some images are already able to be uploaded and parsed, but computer vision supports additional file types that need to be handled
Update: 28th May
The core tasks relating to this story have been completed, namely uploading images with advanced image processing, and querying data based on these images, passing these to the LLM.
There exist some outstanding tasks regarding updating the prompts to match include the images that are passed to the LLM, and also getting it to work with integrated vectorisation. However, it may be better to move these into their own issues, so this main epic can be closed.
@ross-p-smith @adamdougal @superhindupur