azure-search-openai-demo
azure-search-openai-demo copied to clipboard
Upload file from browser
Purpose
- Implemented a new feature that allows users to upload multiple PDF files directly from their browsers.
- Streamlined the user interface for the UploadFiles component, making the frontend code more concise and easily understandable.
- Developed an API endpoint on the frontend to handle file uploads.
- Defined the API's POST method for uploading files in the app.py file.
- Introduced a new module named uploadDocs.py to handle various document-related functionalities.
- Specified the supported formats for uploaded documents.
- Created functions for uploading files to a blob container.
- Implemented a function that extracts text from uploaded documents using PDF text extraction techniques.
- Developed a function to divide the text pages of a document into sections.
- Designed a function to index the extracted sections from documents into a search index using a search client.
Does this introduce a breaking change?
[ ] Yes
[X] No
Pull Request Type
What kind of change does this Pull Request introduce?
[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:
How to Test
git clone [repo-address]
cd [repo-name]
goto
cd openai-enterprise-data-demo/app
run ./start.sh
- Test the code
What to Check
Verify that the following are valid
- ...
Other Information
@RajaCodeArchitect Thanks for the PR! Please fill out the PR description fully, and include details about how this PR is similar/different to the other upload files PR.
@pamelafox Please find the below-implemented screenshots,
Frontend
Backend File upload status
@RajaCodeArchitect Thanks for the PR! Please fill out the PR description fully, and include details about how this PR is similar/different to the other upload files PR.
Hi @pamelafox i have updated about PR description
dears @RajaCodeArchitect and @pamelafox I tried this and it is working but locally only. it is not working in the deployed url
dears @RajaCodeArchitect and @pamelafox I tried this and it is working but locally only. it is not working in the deployed url
For me it will require to provide right to the application in Azure to access the Storage: You must grant the app ( User not managed Identity ) the following roles at RG Level: Search Index Data Contributor Storage Blob Data Contributor
@edercarlima did you check the app rights ? You must grant the app ( User not managed Identity ) the following roles at RG Level: Search Index Data Contributor Storage Blob Data Contributor
Hi @superpoussin22 sorry, let me write here my understanding. In this case, should I go to the resource group (RG), select the application and in the Access Control (IAM) option, associate these permissions? In case my understanding is incorrect, could you detail the step by step?
when I open the application, checking my user in the option: Check access > My access > View my access in Current role assignments show these roles in the Role assignments option
Hi @superpoussin22 sorry, let me write here my understanding. In this case, should I go to the resource group (RG), select the application and in the Access Control (IAM) option, associate these permissions? In case my understanding is incorrect, could you detail the step by step?
when I open the application, checking my user in the option: Check access > My access > View my access in Current role assignments show these roles in the Role assignments option
The roles must be assigned to the application something like "app-backend-xyzfkkjkfjjfkfjfjkf" not your user because the app need the rights to write files and update the index in cognitive search
Hi @superpoussin22
I was able to upload after granting permission to the application through the resource group. Thanks a lot for the help
looking forward to this PR getting merged to main! It's a cool feature and would be super useful 😇
We likely will not be merging a PR with file upload functionality until we have a solid authentication automation mechanism, as we don't want developers to accidentally deploy an app that gives users write access to their resources. But we encourage you to integrate this functionality if it makes sense for your app, just please add an authentication layer on top.
Hey @pamelafox, @zlr-raja ,
I am investigating the uploadfunction Thanks so much for your contribution Raja. I am looking for a user specific feature. Do you have any hint how I can set up a user specific index / upoad - kind of a user specific blob storage for uploading and indexing.
I do not want that everyone in my organisation have a look into the upload files of everyone.
If you have any docu or readme to guide would be perfect.
Thanks upfront. Looking for your support!
@zlr-raja One additional question: Do you think it is also possible to identify the uploaded file from the user and to enable the deletion of it? That would be amazing. For me after pressing the upload function the file disappear and I only get the upload confirmation.
would be awesome!
Hi @superpoussin22, @edercarlima,thank you for your comments, they were very useful
I granted the necessary permissions at the rg level, the indexes and data are generated in the blob when using the load button,
but when I ask in the chat it is not able to recognize the new documents, it usually tells me that it does not have the knowledge or that are not found within the documents to consult.
Do you know what it could be or how you solved it?
Thank you @zlr-raja for this great code! I was wondering if you already tried to adapt the code or are thinking to do it for other format as well?
Hello how do we use this with the latest version of https://github.com/Azure-Samples/azure-search-openai-demo? It seems this pull request uses flask API?
You would need to do a fair bit of merging. We are planning to add document-upload as a feature of the repo itself, but it will include access control considerations, so it is taking more time to properly implement.
Based on @zlr-raja work I've forked his code but that will work with current Azure infra & GPT 4
https://github.com/beouk/azure-search-openai-demo/tree/uploadFileFromBrowser
PR: https://github.com/zlr-raja/azure-search-openai-demo/pull/1
Note: It still uses his original fork so it's 8 months + behind the current repo. Only use if file uploads are critical to your project.
Hoping to work on a fork that's merged with the current repo or delete it when upload functionality with auth is added.
You will still need to grant your app extra roles as above.
@beouk I am working on a branch based on the current version of the repo. It will use a slightly different architecture than this one (using a function for data ingestion) but similar frontend. I'll try to send it in its WIP state today so folks can see it.
@pamelafox that would be very very helpful
You can see the proposed design here: https://github.com/Azure-Samples/azure-search-openai-demo/issues/1393
Based on @zlr-raja work I've forked his code but that will work with current Azure infra & GPT 4
https://github.com/beouk/azure-search-openai-demo
PR: zlr-raja#1
Note: It still uses his original fork so it's 8 months + behind the current repo. Only use if file uploads are critical to your project.
Hoping to work on a fork that's merged with the current repo or delete it when upload functionality with auth is added.
You will still need to grant your app extra roles as above.
Hello @beouk I cloned the repo you linked but when I try to deploy it using azd up, it keeps giving the error below. I tried playing around with gpt-35-turbo to reduce the quota usage but nothing works... Please let me know if you have a solution or if I am doing something wrong. Thanks!
ERROR: deployment failed: failing invoking action 'provision', error deploying infrastructure: deploying to subscription:
Deployment Error Details: InvalidTemplateDeployment: The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '0a49ab3d-56cc-42cd-8af3-04600838be43'. See inner errors for details. InsufficientQuota: This operation require 30 new capacity in quota Tokens Per Minute (thousands) - GPT-35-Turbo, which is bigger than the current available capacity 10. The current quota usage is 230 and the quota limit is 240 for quota Tokens Per Minute (thousands) - GPT-35-Turbo.
TraceID: ce981031c409dcfe8c3f4de3a3265f0b
ERROR: error executing step command 'provision': deployment failed: failing invoking action 'provision', error deploying infrastructure: deploying to subscription:
Deployment Error Details: InvalidTemplateDeployment: The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '0a49ab3d-56cc-42cd-8af3-04600838be43'. See inner errors for details. InsufficientQuota: This operation require 30 new capacity in quota Tokens Per Minute (thousands) - GPT-35-Turbo, which is bigger than the current available capacity 10. The current quota usage is 230 and the quota limit is 240 for quota Tokens Per Minute (thousands) - GPT-35-Turbo.
TraceID: ce981031c409dcfe8c3f4de3a3265f0b
@sanskartewatia You've run out of OpenAI quota in your Azure region. You either need to add your Open AI environment variables or delete some Open AI model deployments to make room for the provisioning of one in this project.
https://github.com/beouk/azure-search-openai-demo
@beouk thanks for the quick response, I selected a different region and the repo(https://github.com/beouk/azure-search-openai-demo.git) is deployed(https://app-backend-ralp2blkzcdvq.azurewebsites.net/) but there is no upload button on the ask or the chat page. Could you please share the link to the code that has this upload functionality working? This feature is critical right now for my task so even if it is 8 months old its fine. Thanks
@sanskartewatia Apologies, you'll need the upload branch too: https://github.com/beouk/azure-search-openai-demo/tree/uploadFileFromBrowser
@sanskartewatia Apologies, you'll need the upload branch too: https://github.com/beouk/azure-search-openai-demo/tree/uploadFileFromBrowser
@beouk Thanks for sharing the link, after deploying this, there is no upload button here either.
Hmm the code is there, did you restart?
Hmm the code is there, did you restart?
Hello @beouk When I try to deploy it after it has created all resources, when I do azd deploy I get following error? Could you please share if I have to change some filename in the code?Thanks for your help
(x) Failed: Deploying service backend
ERROR: getting target resource: resource not found: unable to find a resource tagged with 'azd-service-name: backend'. Ensure the service resource is correctly tagged in your infrastructure configuration, and rerun provision
If you're using my branch, you will need to enable the upload feature-
azd env set USE_USER_UPLOAD true
However, it might be easier to wait for the feature to be finalized, since I'll have docs ready then.
You'll also need to enable auth and ACL following our existing docs, as this feature depends on those being enabled.