azure-search-openai-demo
azure-search-openai-demo copied to clipboard
"The token is expired" with prepdocs.ps1 with ADSL Gen2.
This issue is for a: (mark with an x
)
- [X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Manually create a ADSL Gen2 storage blob and upload a few test files (I have only 7 files, a mix of doc, docx and pdf). Configure all AZURE_ADLS_GEN2 env variables to point to this storage. Also configure AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_CONTAINER to point to the same blob. Manually add an existing AD group to ACL for all the files in the blob (I used Azure Storage Explorer for this).
My config for storage: AZURE_ADLS_GEN2_FILESYSTEM="windchill-sample" AZURE_ADLS_GEN2_FILESYSTEM_PATH="/confidential-rd" AZURE_ADLS_GEN2_STORAGE_ACCOUNT="bochatstoregen2" AZURE_STORAGE_ACCOUNT="bochatstoregen2" AZURE_STORAGE_CONTAINER="windchill-sample"
And other relevant env settings: AZURE_USE_AUTHENTICATION="true" AZURE_ENABLE_GLOBAL_DOCUMENT_ACCESS="true" AZURE_ENABLE_UNAUTHENTICATED_ACCESS="true" AZURE_ENFORCE_ACCESS_CONTROL="false" USE_FEATURE_INT_VECTORIZATION="true"
Embedding settings (this requires upgrading to the latest API, as described here: #1825) AZURE_OPENAI_EMB_DEPLOYMENT="text-embedding-3-large" AZURE_OPENAI_EMB_DEPLOYMENT_CAPACITY=350 AZURE_OPENAI_EMB_DEPLOYMENT_VERSION=1 AZURE_OPENAI_EMB_DIMENSIONS=1536 AZURE_OPENAI_EMB_MODEL_NAME="text-embedding-3-large"
Do azd up or just run prepdocs.py
Any log messages given by the failure
Running "prepdocs.py" ./app/backend/prepdocs.py "C:\Programming\bochat9/data/" --verbose --subscriptionid XXXXXXXXXXXX --storageaccount bochatstoregen2 --container windchill-sample --storageresourcegroup BOCHAT-RG --searchservice gptkb-trnz2cbjn4ofs --index windchill-index8 --openaihost "azure" --openaimodelname "text-embedding-3-large" --openaidimensions 1536 --openaiservice "cog-se-trnz2cbjn4ofs" --openaideployment "text-embedding-3-large" --openaicustomurl "" --openaiorg "" --documentintelligenceservice cog-di-trnz2cbjn4ofs --datalakestorageaccount bochatstoregen2 --datalakefilesystem windchill-sample --datalakepath /confidential-rd --tenantid XXXXXXXXXXXX --useacls --useintvectorization true Using Data Lake Gen2 Storage Account: bochatstoregen2 Ensuring search index windchill-index8 exists Creating windchill-index8 search index Search indexer data source connection updated. Got an error while reading confidential-rd//0095-17-2-00018-03_BOM_rev_B.xlsx -> Server failed to authenticate the request. Please refer to the information in the www-authenticate header. RequestId:887a5c57-601e-0002-292e-dd7627000000 Time:2024-07-23T18:30:34.1625633Z ErrorCode:InvalidAuthenticationInfo authenticationerrordetail:Lifetime validation failed. The token is expired. Content: <Error><Code>InvalidAuthenticationInfo</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header. RequestId:887a5c57-601e-0002-292e-dd7627000000 Time:2024-07-23T18:30:34.1625633Z</Message><AuthenticationErrorDetail>Lifetime validation failed. The token is expired.</AuthenticationErrorDetail></Error> --> skipping file Got an error while reading confidential-rd/*/ABBCABOM-01176 - procedure integration - Objectif L0_19X.ppt -> Server failed to authenticate the request. Please refer to the information in the www-authenticate header. RequestId:8e311dc3-b01e-0073-0d36-dd900c000000 Time:2024-07-23T19:25:39.7029245Z ErrorCode:InvalidAuthenticationInfo authenticationerrordetail:Lifetime validation failed. The token is expired. Content: <Error><Code>InvalidAuthenticationInfo</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header. RequestId:8e311dc3-b01e-0073-0d36-dd900c000000 Time:2024-07-23T19:25:39.7029245Z</Message><AuthenticationErrorDetail>Lifetime validation failed. The token is expired.</AuthenticationErrorDetail></Error> --> skipping file
Expected/desired behavior
While writing this bug report, I think that I understand what is going on. I have another very large blob containing thousand of files. I had experienced a similar bug a few days/weeks ago and I had found a workaround. I first ran prepdocs.ps1 with only a very small blob so that it wouldn't time out and successfully create the index and indexer. Then, I would upload my thousands of files to that blob and run the indexer manually in the portal. This worked. Now, it looks like prepdocs is somehow stuck sourcing the data from this other large blob. The file listing in the error (0095-17-2-00018-03_BOM_rev_B.xlsx) is not part of my "windchill-sample" blob.
OS and Version?
Windows 11
azd version?
azd version 1.9.5 (commit cd2b7af9995d358aab33c782614f801ac1997dde)
Versions
I'm using the very latest version (as of July 23rd) but forked with my own changes (again, see #1825 for the relevant ones).