azure-docs
azure-docs copied to clipboard
Using storage table for knowledge store still seems to require storage account keys even when using managed identity, despite the documentation
I am analyzing the cognitive search service and found the following: In https://learn.microsoft.com/en-us/azure/search/search-howto-managed-identities-data-sources?tabs=portal-sys%2Cportal-user#assign-a-role it I am told that for authentication to storage tables when used as a data source, I need a "reader with data access" role when using managed identity, I tested it's because it's using it to get account keys, not for actual storage access. However the table in the above section also mentions that for using table for knowledge store it's different and I need "Storage Table Contributor", and no key access is necessary. However I have tested both scenarios and it seems to me both require "reader with data access". More specifically I have created cognitive search service and a storage account with public network access disabled, enabled trusted service access (also trying with creating a shared private link), assigned a system assigned managed identity to search service and granted it "Storage Table Contributor" role to the storage account. I created some fake data source/skillset/indexer, where skillset pointed to the storage as a knowledge store, projecting to tables. Trying to run the indexer however didn't work and claimed it cannot read account keys with my managed identity, and that I need reader with data access, same message which I'd get when using storage tables as data source. I confirm that projecting to blobs work, and also that removing knowledge store makes the indexer work, so it's not a misconfiguration of the data source. Is it a bug in the search service or in documentation?
Document Details
⚠ Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.
- ID: 3a54c00f-9e78-d4d9-cc62-1ec2b05e320e
- Version Independent ID: a0a19016-ccc1-43f9-7ac7-317f4ebd8b1c
- Content: Connect using a managed identity - Azure Cognitive Search
- Content Source: articles/search/search-howto-managed-identities-data-sources.md
- Service: cognitive-search
- GitHub Login: @HeidiSteen
- Microsoft Alias: heidist
Hi @webczat, in the "Assign roles" section on step 4, there's a table that lists which roles are needed for different tasks. If you need read permissions, you'll need to pick one of the reader roles. Roles are cumulative. I can update that step to mention "Choose multiple roles if you need both read and write permissions." Would that fix the problem, or am I missing something?
@webczat Thanks for your feedback! We will investigate and update as appropriate.
no, what I mean was that my tests say the table contains invalid information. unless it's a different kind of problem. According to the table, for example, to allow search service to read/write to blobs for the purpose of storing knowledge store, I can assign the "Storage Blob Data Contributor" role. That setting works correctly and knowledge store is updated. However the same table says for Storage tables I need "Storage Table Contributor", which would make perfect sense for permissions to read/write tables with managed identity, however that didn't work and error message mentioned that it tried to access account keys with the managed identity, which according to the table it shouldn't have tried. the connection strings for both cases were identical, i just changed the projections to use tables instead of objects. Unless this requires some changes in connection string but docs don't mention that and I don't know enough about their syntax.
also, note that contributor roles include read permissions.
@webczat, you mentioned that you "enabled trusted service access". Is Azure Storage behind a firewall and in the same region as Search, and did you enable the "Allow trusted Microsoft services to access this storage account" in Azure Storage?
If yes, that option only works for search connection to blob and adls gen2 storage. It's not supported for Table Storage or File Storage yet. This is the article for that scenario: https://learn.microsoft.com/en-us/azure/search/search-indexer-howto-access-trusted-service-exception
yes, this is same region storage account. I was aware of the limitation mentioned in the docs, so one of the reason why I tried to do the test was also to confirm whether that's still true. The problem is, even after adding a shared private link to table storage, or even after disabling the firewall and enabling public network access on this storage, the behavior didn't seem to change at all.
You removed all network protections, but you still can't connect to table storage using a managed identity? That would be a bug. I'll test on my side, but I won't be able to get to it right away. This is the article that explains how to connect to Azure Storage using a managed identity in a public network scenario: https://learn.microsoft.com/en-us/azure/search/search-howto-managed-identities-storage
It's a pretty basic article that includes some redundant information -- the only thing that's new/different in this content is the connection string and role assignments. The other sections (create index, create indexer) are provided as a convenience.
i mean, there is a difference, at least according to the documentation, between using a storage table data source and using a storage table knowledge store. in the first case, it doesn't use managed identity for data access but instead retrieves storage account keys which are then used to authenticate, and docs seem to confirm this by asking to assign "reader with data access" role. I can confirm that scenario seems to work as expected, although I couldn't fully test it because using storage account keys is denied in our environment. I mean proving it works would not make sense considering I'm going to forbid that usage anyway. However when trying with a knowledge store using table storage, docs say "Storage Table Contributor" role is needed, which should mean managed identity is used to authenticate to the storage itself, and not to retrieve the almighty account keys. yet, the error I get when it fails is exactly identical to the one I get while I try to use tables as data source without granting the "reader with data access" role. it claims it's using MI to get account keys, but fails. It might be the error message is wrong, it might be I've made a mistake without realizing it, but at first glance it seems it's either a documentation bug and table storage works the same for both datasource and knowledge store cases and requires account keys even when instructed to use managed identity, or this is a service bug.
Cognitive Search doesn't retrieve the storage account keys, ever. The only time it retrieves keys of any kind are the encryption keys from Azure Key Vault (assuming your using double encryption). For key-based authentication for access to storage, you as the developer have to pass them in the connection string. Otherwise, if you're using an identity, all authentication and authorization is through Azure Active Directory -- no keys involved at all.
For indexing, it's purely read access 100%. Search connects and reads in data from storage, but there is no write back. If you go beyond indexing to use other features -- debug sessions, knowledge store, enrichment caching -- that's when write permissions are needed.
Using the RBAC support in Cognitive Search, you should be able to completely do away with key-based authentication. An alternative to key-based auth was the primary motivation for adding RBAC support.
hmm i think it's not true for table access, at least in datasource case. https://learn.microsoft.com/en-us/azure/search/search-howto-managed-identities-data-sources?tabs=portal-sys%2Cportal-user#assign-a-role this table says that the role "reader with data access" is the role appropriate to grant read access for azure file and table storage (as in using them as data sources). Note that azure files does not support standard aad authentication at all even when using http/rest apis. At this time I do not have access to any azure environment until I return to work so I cannot prove this and paste the role definition here, but "reader with data access" is a reader role which additionally has Microsoft.Storage/listKeys/action permission. Reader role has no data access in the sense of storage rbac permissions, at all. That means the only way the "reader with data access" role grants access is through storage account keys. I tried to use tables as data source for indexer without granting this role, instead the identity had "Storage Table Contributor" only, same as in case of knowledge store. The error I got explicitly mentioned that it failed to use the managed identity to get storage account keys. It also suggested me to use a "reader with data access" role and pointed me to the above table. Note I don't have the exact error message at hand, that's why I don't just quote it. But it confirms that it indeed does use account keys, using managed identity to retrieve them, at least in that case. And my claim is it seems to also be true for the other case, that is, knowledge store, despite documentation saying it's not the case.
hi @webczat, I can repro the issue and it's a product bug. I'll file a ticket. Sorry for the extended dialogue about unrelated security features. Had I tested it right away, I would have seen it immediately. I'm just very surprised because I'm certain this used to work.
For now, assume that you can't use Azure roles and managed identities for table projections in a knowledge store. I would either avoid table projections for now, or use a full access connection string to access Azure Storage (a full access connection string has an account key).
A ticket has been filed so there is no further action items on this GH issue. Thank for you finding the problem and raising the issue! I appreciate your efforts on this.
#please-close
just to confirm, after the fix it will be correct that managed identity authentication will work for knowledge store projections but the key, used directly or retrieved by MI, would still be the only access method when using tables as data sources? like the documentation currently says? or both cases will work?
After the fix, you'll be able to project to tables in a knowledge store using a managed identity. I can't say whether the permissions will continue to be "Reader and Data Access" (which uses keys, as you already noted) or if it will switch to "Storage Table Data Reader"/"Storage Table Contributor". The person who can most reliably speak to the roadmap won't be back until after the holidays, but for now, I would assume that the status quo ("Reader and Data Access") is how the managed identity will connect to Table Storage.
I see now that I introduced a doc bug on my last edit to the permission/role table for table projections (it looks like I said that Storage Table Contributor" is used for kstore table projections). I'll fix that right now to say "Reader and Data Access".
i mean... you have surprised me because it seems like the managed identity case for knowledge projections does not work at all, even with "reader and data access" role? Is that the case? Note I haven't tested whether it does, sorry if it sounded othervise from my previous comments. I just granted "Storage Table Contributor" to the MI and tried to make a table based knowledge store, and then got a message stating it requires "reader and data access", despite what documentation said. My claim was not that granting it "reader and data access" didn't work. I just felt it does not make sense to grant the role and retest, because any direct or indirect usage of storage access keys is denied by my customer's policy, with maybe exception of SAS in some cases. Using MI with "reader and data access" is an indirect key usage here. The reason for the issue was that after testing i was not sure which behavior is intended. It might be that it's a product bug and it should be able to work with "Storage Table Contributor", or that it's a documentation bug and it's intended that it doesn't.
Table projections failed for me using "reader and data access" and a system-managed identity. I'm pretty sure this is a regression. The bug is still unassigned, and I'm not sure if it's been triaged yet. I didn't make it a pri 0 so it might sit a few days before someone looks at it.
interesting, didn't expect it to additionally not work. I rather assumed it does, just in a way which is not satisfactory from my PoV and would require me recommending to deny usage of tables in all contexts related to cognitive search (which I did before reporting this issue). That means I accidentally made you to discover an additional bug which I myself didn't expect, that is kinda crazy.
You can still index data from Azure Table Storage if you use what we call the "push model" of indexing, where your client code pushes a JSON payload for indexing. Push indexing is described here: https://learn.microsoft.com/en-us/azure/search/search-what-is-data-import#pushing-data-to-an-index
The security issue (using storage keys) is applicable only for outbound requests initiated by Search, targeting Storage. Outbound requests made by the search service consist of 1) indexers connecting to and reading in data, and 2) AI enrichment/skillset related features (for table storage, it would be strictly limited to table projections in a knowledge store).
yeah, that's fair. I am aware of that possibility and that's the alternative I proposed. Thank you very much for your help!