azure-search-openai-demo icon indicating copy to clipboard operation
azure-search-openai-demo copied to clipboard

Upload file from browser

Open zlr-raja opened this issue 2 years ago • 15 comments

Purpose

  • Implemented a new feature that allows users to upload multiple PDF files directly from their browsers.
  • Streamlined the user interface for the UploadFiles component, making the frontend code more concise and easily understandable.
  • Developed an API endpoint on the frontend to handle file uploads.
  • Defined the API's POST method for uploading files in the app.py file.
  • Introduced a new module named uploadDocs.py to handle various document-related functionalities.
  • Specified the supported formats for uploaded documents.
  • Created functions for uploading files to a blob container.
  • Implemented a function that extracts text from uploaded documents using PDF text extraction techniques.
  • Developed a function to divide the text pages of a document into sections.
  • Designed a function to index the extracted sections from documents into a search index using a search client.

Does this introduce a breaking change?

[ ] Yes
[X] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

git clone [repo-address]
cd [repo-name]
goto
cd  openai-enterprise-data-demo/app
run  ./start.sh
  • Test the code

What to Check

Verify that the following are valid

  • ...

Other Information

zlr-raja avatar Jul 28 '23 09:07 zlr-raja

@RajaCodeArchitect Thanks for the PR! Please fill out the PR description fully, and include details about how this PR is similar/different to the other upload files PR.

pamelafox avatar Jul 28 '23 13:07 pamelafox

@pamelafox Please find the below-implemented screenshots,

Frontend

uploadfiles

uploadFIles1

Backend File upload status

backendStatus

zlr-raja avatar Jul 28 '23 15:07 zlr-raja

@RajaCodeArchitect Thanks for the PR! Please fill out the PR description fully, and include details about how this PR is similar/different to the other upload files PR.

Hi @pamelafox i have updated about PR description

zlr-raja avatar Jul 31 '23 11:07 zlr-raja

dears @RajaCodeArchitect and @pamelafox I tried this and it is working but locally only. it is not working in the deployed url

MazenSiraj avatar Jul 31 '23 11:07 MazenSiraj

dears @RajaCodeArchitect and @pamelafox I tried this and it is working but locally only. it is not working in the deployed url

For me it will require to provide right to the application in Azure to access the Storage: You must grant the app ( User not managed Identity ) the following roles at RG Level: Search Index Data Contributor Storage Blob Data Contributor

superpoussin22 avatar Jul 31 '23 14:07 superpoussin22

@edercarlima did you check the app rights ? You must grant the app ( User not managed Identity ) the following roles at RG Level: Search Index Data Contributor Storage Blob Data Contributor

superpoussin22 avatar Aug 09 '23 13:08 superpoussin22

Hi @superpoussin22 sorry, let me write here my understanding. In this case, should I go to the resource group (RG), select the application and in the Access Control (IAM) option, associate these permissions? In case my understanding is incorrect, could you detail the step by step?

when I open the application, checking my user in the option: Check access > My access > View my access in Current role assignments show these roles in the Role assignments option

edercarlima avatar Aug 09 '23 16:08 edercarlima

Hi @superpoussin22 sorry, let me write here my understanding. In this case, should I go to the resource group (RG), select the application and in the Access Control (IAM) option, associate these permissions? In case my understanding is incorrect, could you detail the step by step?

when I open the application, checking my user in the option: Check access > My access > View my access in Current role assignments show these roles in the Role assignments option

The roles must be assigned to the application something like "app-backend-xyzfkkjkfjjfkfjfjkf" not your user because the app need the rights to write files and update the index in cognitive search

superpoussin22 avatar Aug 09 '23 16:08 superpoussin22

Hi @superpoussin22

I was able to upload after granting permission to the application through the resource group. Thanks a lot for the help

edercarlima avatar Aug 10 '23 13:08 edercarlima

looking forward to this PR getting merged to main! It's a cool feature and would be super useful 😇

mratanusarkar avatar Sep 28 '23 05:09 mratanusarkar

We likely will not be merging a PR with file upload functionality until we have a solid authentication automation mechanism, as we don't want developers to accidentally deploy an app that gives users write access to their resources. But we encourage you to integrate this functionality if it makes sense for your app, just please add an authentication layer on top.

pamelafox avatar Sep 28 '23 19:09 pamelafox

Hey @pamelafox, @zlr-raja ,

I am investigating the uploadfunction Thanks so much for your contribution Raja. I am looking for a user specific feature. Do you have any hint how I can set up a user specific index / upoad - kind of a user specific blob storage for uploading and indexing.

I do not want that everyone in my organisation have a look into the upload files of everyone.

If you have any docu or readme to guide would be perfect.

Thanks upfront. Looking for your support!

RobSch1406 avatar Dec 02 '23 20:12 RobSch1406

@zlr-raja One additional question: Do you think it is also possible to identify the uploaded file from the user and to enable the deletion of it? That would be amazing. For me after pressing the upload function the file disappear and I only get the upload confirmation.

would be awesome!

RobSch1406 avatar Dec 03 '23 00:12 RobSch1406

Hi @superpoussin22, @edercarlima,thank you for your comments, they were very useful

I granted the necessary permissions at the rg level, the indexes and data are generated in the blob when using the load button,

image

but when I ask in the chat it is not able to recognize the new documents, it usually tells me that it does not have the knowledge or that are not found within the documents to consult.

image

Do you know what it could be or how you solved it?

mrkog3 avatar Dec 07 '23 19:12 mrkog3

Thank you @zlr-raja for this great code! I was wondering if you already tried to adapt the code or are thinking to do it for other format as well?

theafgov avatar Feb 21 '24 07:02 theafgov

Hello how do we use this with the latest version of https://github.com/Azure-Samples/azure-search-openai-demo? It seems this pull request uses flask API?

javapro13 avatar Mar 04 '24 05:03 javapro13

You would need to do a fair bit of merging. We are planning to add document-upload as a feature of the repo itself, but it will include access control considerations, so it is taking more time to properly implement.

pamelafox avatar Mar 04 '24 18:03 pamelafox

Based on @zlr-raja work I've forked his code but that will work with current Azure infra & GPT 4

https://github.com/beouk/azure-search-openai-demo/tree/uploadFileFromBrowser

PR: https://github.com/zlr-raja/azure-search-openai-demo/pull/1

Note: It still uses his original fork so it's 8 months + behind the current repo. Only use if file uploads are critical to your project.

Hoping to work on a fork that's merged with the current repo or delete it when upload functionality with auth is added.

You will still need to grant your app extra roles as above.

ntabernacle avatar Mar 08 '24 15:03 ntabernacle

@beouk I am working on a branch based on the current version of the repo. It will use a slightly different architecture than this one (using a function for data ingestion) but similar frontend. I'll try to send it in its WIP state today so folks can see it.

pamelafox avatar Mar 08 '24 17:03 pamelafox

@pamelafox that would be very very helpful

javapro13 avatar Mar 08 '24 17:03 javapro13

You can see the proposed design here: https://github.com/Azure-Samples/azure-search-openai-demo/issues/1393

pamelafox avatar Mar 08 '24 18:03 pamelafox

Based on @zlr-raja work I've forked his code but that will work with current Azure infra & GPT 4

https://github.com/beouk/azure-search-openai-demo

PR: zlr-raja#1

Note: It still uses his original fork so it's 8 months + behind the current repo. Only use if file uploads are critical to your project.

Hoping to work on a fork that's merged with the current repo or delete it when upload functionality with auth is added.

You will still need to grant your app extra roles as above.

Hello @beouk I cloned the repo you linked but when I try to deploy it using azd up, it keeps giving the error below. I tried playing around with gpt-35-turbo to reduce the quota usage but nothing works... Please let me know if you have a solution or if I am doing something wrong. Thanks!

ERROR: deployment failed: failing invoking action 'provision', error deploying infrastructure: deploying to subscription:

Deployment Error Details: InvalidTemplateDeployment: The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '0a49ab3d-56cc-42cd-8af3-04600838be43'. See inner errors for details. InsufficientQuota: This operation require 30 new capacity in quota Tokens Per Minute (thousands) - GPT-35-Turbo, which is bigger than the current available capacity 10. The current quota usage is 230 and the quota limit is 240 for quota Tokens Per Minute (thousands) - GPT-35-Turbo.

TraceID: ce981031c409dcfe8c3f4de3a3265f0b

ERROR: error executing step command 'provision': deployment failed: failing invoking action 'provision', error deploying infrastructure: deploying to subscription:

Deployment Error Details: InvalidTemplateDeployment: The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '0a49ab3d-56cc-42cd-8af3-04600838be43'. See inner errors for details. InsufficientQuota: This operation require 30 new capacity in quota Tokens Per Minute (thousands) - GPT-35-Turbo, which is bigger than the current available capacity 10. The current quota usage is 230 and the quota limit is 240 for quota Tokens Per Minute (thousands) - GPT-35-Turbo.

TraceID: ce981031c409dcfe8c3f4de3a3265f0b

sanskartewatia avatar Mar 09 '24 21:03 sanskartewatia

@sanskartewatia You've run out of OpenAI quota in your Azure region. You either need to add your Open AI environment variables or delete some Open AI model deployments to make room for the provisioning of one in this project.

ntabernacle avatar Mar 09 '24 22:03 ntabernacle

https://github.com/beouk/azure-search-openai-demo

@beouk thanks for the quick response, I selected a different region and the repo(https://github.com/beouk/azure-search-openai-demo.git) is deployed(https://app-backend-ralp2blkzcdvq.azurewebsites.net/) but there is no upload button on the ask or the chat page. Could you please share the link to the code that has this upload functionality working? This feature is critical right now for my task so even if it is 8 months old its fine. Thanks

sanskartewatia avatar Mar 09 '24 23:03 sanskartewatia

@sanskartewatia Apologies, you'll need the upload branch too: https://github.com/beouk/azure-search-openai-demo/tree/uploadFileFromBrowser

ntabernacle avatar Mar 10 '24 10:03 ntabernacle

@sanskartewatia Apologies, you'll need the upload branch too: https://github.com/beouk/azure-search-openai-demo/tree/uploadFileFromBrowser

@beouk Thanks for sharing the link, after deploying this, there is no upload button here either.

sanskartewatia avatar Mar 10 '24 22:03 sanskartewatia

Hmm the code is there, did you restart?

ntabernacle avatar Mar 11 '24 09:03 ntabernacle

Hmm the code is there, did you restart?

Hello @beouk When I try to deploy it after it has created all resources, when I do azd deploy I get following error? Could you please share if I have to change some filename in the code?Thanks for your help

(x) Failed: Deploying service backend

ERROR: getting target resource: resource not found: unable to find a resource tagged with 'azd-service-name: backend'. Ensure the service resource is correctly tagged in your infrastructure configuration, and rerun provision

sanskartewatia avatar Mar 12 '24 00:03 sanskartewatia

If you're using my branch, you will need to enable the upload feature-

azd env set USE_USER_UPLOAD true

However, it might be easier to wait for the feature to be finalized, since I'll have docs ready then.

pamelafox avatar Mar 12 '24 00:03 pamelafox

You'll also need to enable auth and ACL following our existing docs, as this feature depends on those being enabled.

pamelafox avatar Mar 12 '24 00:03 pamelafox