azure-search-openai-demo
azure-search-openai-demo copied to clipboard
GPT-vision enhancement related queries
Please provide us with the following information:
This issue is for a: (mark with an x
)
- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
- [x] enquiries
Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
OS and Version?
Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
azd version?
run
azd version
and copy paste here.
Versions
Mention any other details that might be useful
Understand that GPT4-v is supported, appreciate your effort. The existing solution have Azure Document Intelligence doing the OCR, and OpenAI embedding doing the text embedding. Current solution has computer vision can also do OCR and do image embedding.
Question: When all credential has input to run the code, will the solution able to determine which service to use with cost efficiently or it will applied both ?
Example i have a financial performance tabular format but in screenshot, Azure Document Intelligence should be sufficient. But in same time computer vision also applied in credential, will computer vision embedded it as image instead of letting Azure DI convert to text or both will do in parallel Asking this question because in normal financial report will be having mixture of tabular format and chart.
For tabular table in image: Azure DI will be best For Charting: Computer Vision can do the vectorization embedding.
Thanks again for keep improving the repos!
Thanks! We'll be in touch soon.
Hm, I'm not sure I understand the question.
We use Azure Document Intelligence so that we can extract any relevant text from a document, since most documents have text in addition to charts, and we want to find that text when performing a hybrid search.
We use the Computer Vision API to compute an embedding for images, since it can embed images in addition to text. That way, if an image contains a picture of a tree, and the user has a question related to trees, then our search can find both text about trees and images of trees.
So we need both of these tools for different steps. I think we would only drop Document Intelligence if our input data had literally no text in it at all (which is possible! I haven't tried that with this app to see how well that'd work).
@pamelafox thanks for trying to answer my question , although my question seems little bit confusing..
Maybe i can provide an example:
if the table below is an image in my PDF, will it be converted to text and being text embedded , in same time also being image embedded if computer vision is enable?