trino icon indicating copy to clipboard operation
trino copied to clipboard

Add most properties to Identity's hashCode and equals

Open ksobolew opened this issue 3 years ago • 15 comments

Description

The rationale is that, in addition to the user name, all of these can affect access control decisions for an Identity, so they should be considered distinct if any of those attributes differ.

The StatementAnalyzer, during the analysis stage, collects all table references along with their corresponding access controls and identities. At the end of the analysis, this is used to do the access checks. They are collected in a Map, where the key is the pair of access control and Identity, so that we deduplicate the necessary checks. But for this deduplication to work properly, Identity instances that result in different accesses should be considered distinct.

In case of views, if we need to add and additional reference, we add it with a changed access control, so the key is properly deduplicated. But a table reference is also added for column masks and row filters. For these we get a ViewExpression, which contains only the user name. From that we have to construct a proper Identity. Previously it was entirely "synthetic" and included no roles; if the current session has roles enabled, this now makes it distinct. If, then, access to the table is granted for the role, not for the user, then that Identity will fail the access checks and the query will fail.

The simple trick to solve this is to return the session's Identity if the mask's or filter's user name is equal to the current session's user. A more proper solution would make access controls return a full Identity the same way it returns it for views with SECURITY DEFINER.

Non-technical explanation

Fixing a possible source of bugs.

Release notes

(x) This is not user-visible and no release notes are required. ( ) Release notes are required, please propose a release note for me. ( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

ksobolew avatar Sep 14 '22 12:09 ksobolew

@dain, I can see that there is a class io.trino.security.AccessControlUtil.FullIdentityEquality that is kind of used to achieve the same thing. It raises a couple of questions:

  • Why was it added as a separate class instead of changing Identity#equals?
  • Should it be used more widely in places where this matters?
  • Or, should I include more attributes in this PR and remove io.trino.security.AccessControlUtil.FullIdentityEquality?
    • Im 90% certain that catalogRoles should also be compared, but I'm not sure about the principal...

ksobolew avatar Sep 14 '22 12:09 ksobolew

Also, do you have any idea how this can be tested? :)

ksobolew avatar Sep 14 '22 12:09 ksobolew

Column masks break with this change :/ (No, apparently we don't have enough coverage to detect this.)

ksobolew avatar Sep 15 '22 10:09 ksobolew

I think we should fallow FullIdentityEquality. Move the semantic to Identity and remove FullIdentityEquality.

In regards to testing I would propose the below: Given: User U has roles A and B. User U is an an owner of view V. V is selecting from table T. Role A has access to table T. Then: User U with role B enabled selects from V.

If I understand correctly, previously it should fail as we would use U with role B to select from T.

kokosing avatar Sep 15 '22 11:09 kokosing

Column masks break with this change

Can you please elaborate?

kokosing avatar Sep 15 '22 11:09 kokosing

Column masks break with this change

Can you please elaborate?

I was still investigating, while also learning way more about the analyzer than I wanted ;)

When we analyze the FROM clause, we use the current session to add the table reference. That session has all the enabled roles present. When there is also a mask on that table, we get it as a ViewExpression, which contains an identity as a String - basically just a user name. I don't know why we do that, really. All implementations of (system) access controls I know of just put the current session's user name there. Is there a possibility that a mask will be evaluated as a different user? Anyway, we take that user name and create a separate Identity for it, which does not have any enabled roles. It does not .equals() the one that we put for the table before, so it gets inserted as a separate entry. So at the end we do the access check for the table twice - once with an identity with enabled roles, and once without enabled roles. If the user gains access permissions via grants to the role, not to the user, then the second access check will fail.

ksobolew avatar Sep 15 '22 11:09 ksobolew

When there is also a mask on that table, we get it as a ViewExpression, which contains an identity as a String - basically just a user name

Can we change that to make it a full Identity?

(Yes, that's an SPI compat break)

ksobolew avatar Sep 15 '22 11:09 ksobolew

ViewExpression, which contains an identity as a String

This is like for any other view that has user owner only. So then you need to use all the roles of given user to evaluate the mask or filter. Masking or filtering is very similar to views.

I don't know why we do that,

I am not sure if I know what you don't know. It is because the fact the evaluation may require more access than the actual user. For example to filter data in one table you access the other table that is only visible to admins. Also this is required if you have multiple masks, filters or mix. Then if you mask something and then you want to use the original value to filter you cannot do it as a session user as you would see the masked value.

All implementations of (system) access controls I know of just put the current session's user name there.

Yes. I think it is the limitation of the implementation of the plugin, not the engine. As you can see your plugins could do more.

Anyway, we take that user name and create a separate Identity for it, which does not have any enabled roles.

I think that is a mistake we should use all roles of given user. You can easily use https://github.com/trinodb/trino/pull/14130#issuecomment-1247957840 to generate issues, just replace a view with mask or filter. Please post tests for both.

kokosing avatar Sep 15 '22 12:09 kokosing

Can we change that to make it a full Identity?

That would be cool. But one step a time.

kokosing avatar Sep 15 '22 12:09 kokosing

In regards to testing I would propose the below: Given: User U has roles A and B. User U is an an owner of view V. V is selecting from table T. Role A has access to table T. Then: User U with role B enabled selects from V.

If I understand correctly, previously it should fail as we would use U with role B to select from T.

I think this would work in #13991, but nor here :)

ksobolew avatar Sep 15 '22 15:09 ksobolew

You are right. Next try:

You can use io.trino.metadata.SystemSecurityMetadata#getViewRunAsIdentity.

SELECT * FROM t, v, where v is SELECT * FROM t. User U has roles A which is able to read from t and it is a session user. getViewRunAsIdentity return just U without any role so it should fail (as view should not be able to read from t) while now it will pass.

kokosing avatar Sep 16 '22 11:09 kokosing

Add enabled roles and groups to Identity's identity

Funny.

I'd call this "Include enable roles and groups in Identity equals" not as nice English, but avoid being overly playful

findepi avatar Sep 16 '22 15:09 findepi

...and hashCode.

Sure, I can change that.

ksobolew avatar Sep 16 '22 20:09 ksobolew

You can use io.trino.metadata.SystemSecurityMetadata#getViewRunAsIdentity.

SELECT * FROM t, v, where v is SELECT * FROM t. User U has roles A which is able to read from t and it is a session user. getViewRunAsIdentity return just U without any role so it should fail (as view should not be able to read from t) while now it will pass.

I don't think this can be triggered for views, because when a view is involved, we also change the implementations of access control, which is definitely distinct. But it indirectly shows with column masks and row filters, which also generate table references. I had to change a couple of tests to make that evident, though.

ksobolew avatar Sep 19 '22 14:09 ksobolew

ci hit: #13946

ksobolew avatar Sep 21 '22 11:09 ksobolew

At the end of the analysis, this is used to do the access checks. They are collected in a Map, where the key is the pair of access control and Identity, so that we deduplicate the necessary checks. But for this deduplication to work properly, Identity instances that result in different accesses should be considered distinct.

I don't understand why we would do this. The security system should cache access control checks in the query and reuse them. Then you don't need to dedupe. This is a better design as the keys for the cache are security system sensitive. Some security systems only look at roles, some only user id, some only extra credentials, and some are just fixed rules for everyone. This means that if the security system is not caching the policies in the query, then you could get called multiple times for the parts of the identity that you use, and do multiple policy lookups getting different views of the policy. Therefore, the deduplication in the engine is just useless extra work.

dain avatar Oct 26 '22 19:10 dain

The security system should cache access control checks in the query and reuse them.

Can that actually work with the current design? The security system is not called until the very end, except to get row filters, column masks and "run-as Identity" for views. And by that point some crucial information is lost.

ksobolew avatar Oct 27 '22 10:10 ksobolew

Oh, you mean the current design should be replaced with what you propose?

ksobolew avatar Oct 27 '22 10:10 ksobolew

I think we could implement CachingAccessControl that is used for a single query execution (like transactional CachingHiveMetastore. The thing that is implemented now, is caching things only for tables. The access control changed a lot since when it was added. Great idea @dain .

However, any kind of caching here requires good equals and hashCode implementation for Identity. So this PR still is valid and is pending the review.

The security system is not called until the very end,

This is not true. We do access control checks for everything else than tables (function calls etc).

kokosing avatar Oct 27 '22 12:10 kokosing

If you extract the first commit to separate PR, then I can merge it.

kokosing avatar Nov 18 '22 20:11 kokosing

Replaced with https://github.com/trinodb/trino/pull/15111

kokosing avatar Nov 18 '22 21:11 kokosing