trino
trino copied to clipboard
Add most properties to Identity's hashCode and equals
Description
The rationale is that, in addition to the user name, all of these can affect access control decisions for an Identity, so they should be considered distinct if any of those attributes differ.
The StatementAnalyzer, during the analysis stage, collects all table references along with their corresponding access controls and identities. At the end of the analysis, this is used to do the access checks. They are collected in a Map, where the key is the pair of access control and Identity, so that we deduplicate the necessary checks. But for this deduplication to work properly, Identity instances that result in different accesses should be considered distinct.
In case of views, if we need to add and additional reference, we add it with a changed access control, so the key is properly deduplicated. But a table reference is also added for column masks and row filters. For these we get a ViewExpression, which contains only the user name. From that we have to construct a proper Identity. Previously it was entirely "synthetic" and included no roles; if the current session has roles enabled, this now makes it distinct. If, then, access to the table is granted for the role, not for the user, then that Identity will fail the access checks and the query will fail.
The simple trick to solve this is to return the session's Identity if the mask's or filter's user name is equal to the current session's user. A more proper solution would make access controls return a full Identity the same way it returns it for views with SECURITY DEFINER.
Non-technical explanation
Fixing a possible source of bugs.
Release notes
(x) This is not user-visible and no release notes are required. ( ) Release notes are required, please propose a release note for me. ( ) Release notes are required, with the following suggested text:
# Section
* Fix some things. ({issue}`issuenumber`)
@dain, I can see that there is a class io.trino.security.AccessControlUtil.FullIdentityEquality that is kind of used to achieve the same thing. It raises a couple of questions:
- Why was it added as a separate class instead of changing
Identity#equals? - Should it be used more widely in places where this matters?
- Or, should I include more attributes in this PR and remove
io.trino.security.AccessControlUtil.FullIdentityEquality?- Im 90% certain that
catalogRolesshould also be compared, but I'm not sure about theprincipal...
- Im 90% certain that
Also, do you have any idea how this can be tested? :)
Column masks break with this change :/ (No, apparently we don't have enough coverage to detect this.)
I think we should fallow FullIdentityEquality. Move the semantic to Identity and remove FullIdentityEquality.
In regards to testing I would propose the below: Given: User U has roles A and B. User U is an an owner of view V. V is selecting from table T. Role A has access to table T. Then: User U with role B enabled selects from V.
If I understand correctly, previously it should fail as we would use U with role B to select from T.
Column masks break with this change
Can you please elaborate?
Column masks break with this change
Can you please elaborate?
I was still investigating, while also learning way more about the analyzer than I wanted ;)
When we analyze the FROM clause, we use the current session to add the table reference. That session has all the enabled roles present. When there is also a mask on that table, we get it as a ViewExpression, which contains an identity as a String - basically just a user name. I don't know why we do that, really. All implementations of (system) access controls I know of just put the current session's user name there. Is there a possibility that a mask will be evaluated as a different user? Anyway, we take that user name and create a separate Identity for it, which does not have any enabled roles. It does not .equals() the one that we put for the table before, so it gets inserted as a separate entry. So at the end we do the access check for the table twice - once with an identity with enabled roles, and once without enabled roles. If the user gains access permissions via grants to the role, not to the user, then the second access check will fail.
When there is also a mask on that table, we get it as a
ViewExpression, which contains an identity as aString- basically just a user name
Can we change that to make it a full Identity?
(Yes, that's an SPI compat break)
ViewExpression, which contains an identity as a String
This is like for any other view that has user owner only. So then you need to use all the roles of given user to evaluate the mask or filter. Masking or filtering is very similar to views.
I don't know why we do that,
I am not sure if I know what you don't know. It is because the fact the evaluation may require more access than the actual user. For example to filter data in one table you access the other table that is only visible to admins. Also this is required if you have multiple masks, filters or mix. Then if you mask something and then you want to use the original value to filter you cannot do it as a session user as you would see the masked value.
All implementations of (system) access controls I know of just put the current session's user name there.
Yes. I think it is the limitation of the implementation of the plugin, not the engine. As you can see your plugins could do more.
Anyway, we take that user name and create a separate Identity for it, which does not have any enabled roles.
I think that is a mistake we should use all roles of given user. You can easily use https://github.com/trinodb/trino/pull/14130#issuecomment-1247957840 to generate issues, just replace a view with mask or filter. Please post tests for both.
Can we change that to make it a full Identity?
That would be cool. But one step a time.
In regards to testing I would propose the below: Given: User U has roles A and B. User U is an an owner of view V. V is selecting from table T. Role A has access to table T. Then: User U with role B enabled selects from V.
If I understand correctly, previously it should fail as we would use U with role B to select from T.
I think this would work in #13991, but nor here :)
You are right. Next try:
You can use io.trino.metadata.SystemSecurityMetadata#getViewRunAsIdentity.
SELECT * FROM t, v, where v is SELECT * FROM t. User U has roles A which is able to read from t and it is a session user. getViewRunAsIdentity return just U without any role so it should fail (as view should not be able to read from t) while now it will pass.
Add enabled roles and groups to Identity's identity
Funny.
I'd call this "Include enable roles and groups in Identity equals" not as nice English, but avoid being overly playful
...and hashCode.
Sure, I can change that.
You can use
io.trino.metadata.SystemSecurityMetadata#getViewRunAsIdentity.
SELECT * FROM t, v, where v isSELECT * FROM t. User U has roles A which is able to read from t and it is a session user.getViewRunAsIdentityreturn justUwithout any role so it should fail (as view should not be able to read from t) while now it will pass.
I don't think this can be triggered for views, because when a view is involved, we also change the implementations of access control, which is definitely distinct. But it indirectly shows with column masks and row filters, which also generate table references. I had to change a couple of tests to make that evident, though.
ci hit: #13946
At the end of the analysis, this is used to do the access checks. They are collected in a Map, where the key is the pair of access control and Identity, so that we deduplicate the necessary checks. But for this deduplication to work properly, Identity instances that result in different accesses should be considered distinct.
I don't understand why we would do this. The security system should cache access control checks in the query and reuse them. Then you don't need to dedupe. This is a better design as the keys for the cache are security system sensitive. Some security systems only look at roles, some only user id, some only extra credentials, and some are just fixed rules for everyone. This means that if the security system is not caching the policies in the query, then you could get called multiple times for the parts of the identity that you use, and do multiple policy lookups getting different views of the policy. Therefore, the deduplication in the engine is just useless extra work.
The security system should cache access control checks in the query and reuse them.
Can that actually work with the current design? The security system is not called until the very end, except to get row filters, column masks and "run-as Identity" for views. And by that point some crucial information is lost.
Oh, you mean the current design should be replaced with what you propose?
I think we could implement CachingAccessControl that is used for a single query execution (like transactional CachingHiveMetastore. The thing that is implemented now, is caching things only for tables. The access control changed a lot since when it was added. Great idea @dain .
However, any kind of caching here requires good equals and hashCode implementation for Identity. So this PR still is valid and is pending the review.
The security system is not called until the very end,
This is not true. We do access control checks for everything else than tables (function calls etc).
If you extract the first commit to separate PR, then I can merge it.
Replaced with https://github.com/trinodb/trino/pull/15111