Motivation

Company data often comprises various types of images, including screenshots, maps, and diagrams. By enabling the chat admin app to ingest and process these images, it can provide more accurate and relevant responses to user queries that involve visual data. This ensures that the chat app can fully utilise all available company data to deliver an improved user experience.

Note: Image processing is only available using GPT-4 https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-preview

How would you feel if this feature request was implemented?

gif

Requirements

Ensure existing application works correctly with GPT-4
Allow images to be uploaded via the Admin application
When "Reprocess all" is click via the Admin app, reprocess the images
When a question is asked, image data should be searched and passed to gpt-4-vision to generate a response
Citations should link to the image stored in blob storage
~Stretch: Fallback to OCR/document intelligence if image of a document detected~
~Stretch: Allow images to be uploaded when chatting~

Tasks

[x] #713
[x] #715
[x] #728
[x] #748
[x] #749
[x] #750
[x] #752
[x] #965
[ ] #964
[ ] #993
[ ] Investigate it's possible to turn advanced image processing on/off without reprovisioning
[ ] Allow images to be uploaded via pull model (if possible, this indicates it may not be - https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/gpt4v.md#setup-and-usage)

Bugs

[x] #929

Feb 22 '24 22:02 ross-p-smith

Reference here: - https://github.com/Azure-Samples/azure-search-openai-demo/pull/1056

Mar 25 '24 20:03 ross-p-smith

Update 22nd April:

After spiking possible technology choices, I believe the best way forward is to:

Use Azure Computer Vision to generate embeddings of the image
Use GPT-4-vision to generate a description of the image and text-embeddings-ada-002 embed the description
Store both embedding vectors in the Azure AI Search index

Then when querying, generate embeddings of the question using both Azure Computer Vision and text-embeddings-ada-002.

Note: this does require us to change the index to allow for an additional imageEmbeddings field.

I was initially going to create an ADR deciding on which tools would be best to use, but given my research, spike and investigation on how this is implemented in https://github.com/Azure-Samples/azure-search-openai-demo/pull/1056, I now believe using both appoaches combined will give the best results.

Next steps are to now start building this into CWYDSA

Apr 22 '24 07:04 adamdougal

Update 23rd April:

The computer vision and gpt-4-vision model deployment resources are now being provisioned
This is applied if USE_GPT4_VISION=true
Unfortunately, gpt-4-vision does not support function calling, so this is an additional deployment alongside another model
Next steps are to allow images to be uploaded via the admin app
It looks like some images are already able to be uploaded and parsed, but computer vision supports additional file types that need to be handled

Apr 23 '24 08:04 adamdougal

Update: 28th May

The core tasks relating to this story have been completed, namely uploading images with advanced image processing, and querying data based on these images, passing these to the LLM.

There exist some outstanding tasks regarding updating the prompts to match include the images that are passed to the LLM, and also getting it to work with integrated vectorisation. However, it may be better to move these into their own issues, so this main epic can be closed.

@ross-p-smith @adamdougal @superhindupur

May 28 '24 10:05 cecheta

Include GPT-4 V model to be able to search for images and embedding images.

Motivation

How would you feel if this feature request was implemented?

Requirements

Tasks

Bugs