chat-with-your-data-solution-accelerator icon indicating copy to clipboard operation
chat-with-your-data-solution-accelerator copied to clipboard

Include GPT-4 V model to be able to search for images and embedding images.

Open ross-p-smith opened this issue 1 year ago • 4 comments

Motivation

Company data often comprises various types of images, including screenshots, maps, and diagrams. By enabling the chat admin app to ingest and process these images, it can provide more accurate and relevant responses to user queries that involve visual data. This ensures that the chat app can fully utilise all available company data to deliver an improved user experience.

Note: Image processing is only available using GPT-4 https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-preview

How would you feel if this feature request was implemented?

gif

Requirements

  • Ensure existing application works correctly with GPT-4
  • Allow images to be uploaded via the Admin application
  • When "Reprocess all" is click via the Admin app, reprocess the images
  • When a question is asked, image data should be searched and passed to gpt-4-vision to generate a response
  • Citations should link to the image stored in blob storage
  • ~Stretch: Fallback to OCR/document intelligence if image of a document detected~
  • ~Stretch: Allow images to be uploaded when chatting~

Tasks

  • [x] #713
  • [x] #715
  • [x] #728
  • [x] #748
  • [x] #749
  • [x] #750
  • [x] #752
  • [x] #965
  • [ ] #964
  • [ ] #993
  • [ ] Investigate it's possible to turn advanced image processing on/off without reprovisioning
  • [ ] Allow images to be uploaded via pull model (if possible, this indicates it may not be - https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/gpt4v.md#setup-and-usage)

Bugs

  • [x] #929

ross-p-smith avatar Feb 22 '24 22:02 ross-p-smith

Reference here: - https://github.com/Azure-Samples/azure-search-openai-demo/pull/1056

ross-p-smith avatar Mar 25 '24 20:03 ross-p-smith

Update 22nd April:

After spiking possible technology choices, I believe the best way forward is to:

  • Use Azure Computer Vision to generate embeddings of the image
  • Use GPT-4-vision to generate a description of the image and text-embeddings-ada-002 embed the description
  • Store both embedding vectors in the Azure AI Search index

Then when querying, generate embeddings of the question using both Azure Computer Vision and text-embeddings-ada-002.

Note: this does require us to change the index to allow for an additional imageEmbeddings field.

I was initially going to create an ADR deciding on which tools would be best to use, but given my research, spike and investigation on how this is implemented in https://github.com/Azure-Samples/azure-search-openai-demo/pull/1056, I now believe using both appoaches combined will give the best results.

Next steps are to now start building this into CWYDSA

adamdougal avatar Apr 22 '24 07:04 adamdougal

Update 23rd April:

  • The computer vision and gpt-4-vision model deployment resources are now being provisioned
  • This is applied if USE_GPT4_VISION=true
  • Unfortunately, gpt-4-vision does not support function calling, so this is an additional deployment alongside another model
  • Next steps are to allow images to be uploaded via the admin app
  • It looks like some images are already able to be uploaded and parsed, but computer vision supports additional file types that need to be handled

adamdougal avatar Apr 23 '24 08:04 adamdougal

Update: 28th May

The core tasks relating to this story have been completed, namely uploading images with advanced image processing, and querying data based on these images, passing these to the LLM.

There exist some outstanding tasks regarding updating the prompts to match include the images that are passed to the LLM, and also getting it to work with integrated vectorisation. However, it may be better to move these into their own issues, so this main epic can be closed.

@ross-p-smith @adamdougal @superhindupur

cecheta avatar May 28 '24 10:05 cecheta