oso icon indicating copy to clipboard operation
oso copied to clipboard

Support for async Python

Open gj opened this issue 2 years ago • 10 comments

This is an external tracking issue to:

  • Gauge interest from the community for this feature.
  • Learn about what you'd want to see out of it if we worked on it.

So please:

  • Upvote the issue if it's important to you, and
  • Comment with any relevant info on your requirements, use cases, etc.

Thanks!

P.S.: For now, we do all our internal engineering issue tracking separately in Notion, so you won't necessarily see regular updates to the project status here even once we begin work.

gj avatar Oct 19 '21 18:10 gj

We're interested in using Oso at the company where I work, but we're having a hard time figuring out how to ensure using Oso won't negatively impact latency (compared to hand-written authorization code). Our tech stack relies heavily on asynchronous Python:

We also use heavily use dataloaders (via the aio-dataloader Python package) to minimize the number of database round-trips required for each request.

We'd like to use Oso to perform resource-based authorization, but are struggling to figure out how to efficiently perform database calls required to determine if an action on a resource is authorized for the current user. Ideally we could define all of our authorization policies and their data dependencies in our .polar files. But without async Python support, we would be limited to separate, synchronous calls to the database for every resource that needs to be authorized in a single request.

Our requests frequently involve many resources. For example, a user might want to list 100 resources in a single GraphQL request. We would like to be able to use async directives in our .polar files and a DataLoader to deduplicate authorization-related database lookups for all 100 resources into a single database call:

GraphQL Request --+-->  Resource 1  -> Oso authorization query -> Authorization Dataloader --+--> Database call
                  |                                                                          |
                  |-->  Resource 2  -> Oso authorization query -> Authorization Dataloader --|
                  |                                                                          |
                  |-->     ...      -> Oso authorization query -> Authorization Dataloader --|
                  |                                                                          |
                  \--> Resource 100 -> Oso authorization query -> Authorization Dataloader --/

The best workaround we've been able to come up with is prefetching authorization information in an asynchronous context before the Oso authorization query and passing in the relevant authorization information so it can be processed by Oso. Our preference though would be for Oso to be able to automatically decide whether or not this information even needs to be retrieved (for example, admin users don't need a database call).

I'm very new to Oso, so it's very possible that I'm missing an easier solution to this problem, but in my mind support for async Python methods would be a great step in the right direction for us 😄

connorbrinton avatar Nov 12 '21 17:11 connorbrinton

@connorbrinton Thanks for the write-up! That's super useful context to understand. From what you're describing, I'd say the workaround you describe (pre-fetching relevant data and making it available during the policy execution) is probably your best bet.

I'm a bit worried that even with async support, the API still wouldn't enable the dataloader-like pattern you're describing. That might end up being a separate feature entirely (one that I quite like the sound of!). I do have a couple questions which might help me recommend a path:

  1. In your example, are all the 1...100 resources of the same type?
  2. What type of authorization data (organization roles, for instance) might you need to fetch to authorize those resources? Might that data be the same for different resources?

gkaemmer avatar Nov 12 '21 19:11 gkaemmer

I'm a bit worried that even with async support, the API still wouldn't enable the dataloader-like pattern you're describing. That might end up being a separate feature entirely (one that I quite like the sound of!).

I think having database call batching be separate from Oso's authorization logic definitely makes sense 👍 Part of the magic of dataloaders is that as long as Oso supports calling async methods, end-users can use dataloaders without any changes to Oso. I'm not terribly well-versed in .polar files yet, but I'm imagining logic something like this:

allow(context, user, action, resource) if
    await context["resource_permission_loader"].user_can_access_resource(user, action, resource)

The simple implementation of the resource permission loader would be:

class ResourcePermissionLoader:
    async def user_can_access_resource(user, action, resource) -> bool:
        # Make a database call to check the user's permissions to act on the given resource
        ...

        return decision

The DataLoader-based implementation would be:

class ResourcePermissionLoader(Dataloader):
    async def batch_load_fn(keys: Iterable[Tuple[User, Action, Resource]]) -> Iterable[bool]:
        # Make a single database call to check permissions of all users to act on the corresponding resources
        ...

        return decisions

    async def user_can_access_resource(user, action, resource) -> bool:
        return self.get((user, action, resource))

Both approaches work exactly the same from the perspective of Oso, but the Dataloader-based approach batches together all queries made in a single asynchronous tick, deduplicates them and makes a single database call to service all of the requests, reducing latency.

  1. In your example, are all the 1...100 resources of the same type?

Yup! Each resource represents a text classification model, which we selectively allow our clients to access based on whether it's generally available or client-specific.

  1. What type of authorization data (organization roles, for instance) might you need to fetch to authorize those resources? Might that data be the same for different resources?

For our text classification models, we currently perform batch authorization decisions using a dataloader that we call manually whenever an authorization decision is needed. All authorization queries are batched together and the dataloader examines the following criteria for each query:

  1. Is the user is a global admin? (if so, everything is accessible)
  2. To which organizations does the user have access to with the appropriate role to access text classification models?
  3. Which organizations have access to each text classification model?
  4. Is there any overlap between (2) and (3)?

(1) and (2) are provided to our app through special headers (similar to a JWT), so we don't need to do any kind of special lookup to access that information. If (3) is necessary, the dataloader will issue a single database query retrieving information for every authorization query at once. It then performs some computation to determine (4).

(1) and (2) are used for authorization decisions on multiple resource types, but (3) and (4) are only used for text classification model access decisions. (3) is the bit of information that we're interested in retrieving asynchronously so we can batch together similar requests.

Long-term though, I think I would be interested in being able to retrieve (1) and (2) asynchronously as well. Currently we store all user roles in authorization headers injected by an API gateway, but I could see the size of those headers getting out of control eventually, necessitating some kind of out-of-band lookup for large or less-important roles or attributes.

connorbrinton avatar Nov 15 '21 16:11 connorbrinton

@connorbrinton Sorry it took me a second to reply here -- that's super helpful context. If I'm understanding correctly, I think this type of thing could be accomplished using our data filtering feature.

Instead of performing steps 1-4 in code inside of your dataloader, you could take the user, call oso.authorized_query(user, "read", TextClassificationModel). If you set up the fetchers properly, that should turn into a query something like:

  1. If the user is a global admin: SELECT * FROM text_classification_model WHERE 1 = 1
  2. If the user is not a global admin (and therefore needs access through organizations): SELECT * FROM text_classification_model WHERE text_classification_model.organization_id IN (X, Y, Z) where (X, Y, Z) are the organization IDs the user has access to.

Then, in whichever resolver you're loading the 1...100 models, you can use a subquery to make sure that the model's ID is inside of the set returned from the data filtering query. In a way, this means you're "eagerly" performing authorization as the data is being loaded, instead of performing it later by filtering the loaded data.

The specifics of this depend on how text classification models are related to organizations, and you might need some hacks to make this work in an async context, but the basic concept might still apply. What do you think?

Curious about one thing:

Long-term though, I think I would be interested in being able to retrieve (1) and (2) asynchronously as well. Currently we store all user roles in authorization headers injected by an API gateway, but I could see the size of those headers getting out of control eventually, necessitating some kind of out-of-band lookup for large or less-important roles or attributes.

In that scenario, where would you expect (1) and (2) to be loaded from? E.g. from the same database as (3)? Some other service over an HTTP request? I ask because we're looking into better "data loading" features for Oso policies, and this sounds super relevant to that development.

Definitely jump into our slack: https://join-slack.osohq.com/ -- we'd love to help out as you look into this more.

More details on data filtering: https://docs.osohq.com/guides/data_filtering.html

gkaemmer avatar Nov 19 '21 13:11 gkaemmer

Definitely interested in this!

Oso is getting mature and we would love to integrate it in our FastApi backend. So oso's async Python capability is the only thing that is holding us back.

Any update on this?

mcmoodoo avatar Jul 13 '22 20:07 mcmoodoo

Hi @gkaemmer @gj, any update on this topic? Thanks

fullonic avatar Apr 04 '23 09:04 fullonic

@gj, any updates yet?

yehuda-margolis avatar Jul 12 '23 08:07 yehuda-margolis

Hi folks, sorry for the silence. No update yet on this issue specifically. We're currently figure out how to open source some of the work we've done on Oso Cloud. More updates to come on that in #1703.

gj avatar Jul 12 '23 20:07 gj

Hey @gj , Do you have any updates on this feature? Thanks

fullonic avatar Feb 14 '24 11:02 fullonic

Hey folks: we have deprecated this package so we won't be able to add this feature in the near-term. In the medium/long-term, however, we expect to have a suitable replacement, and would definitely be interested in supporting async python there.

gneray avatar Feb 14 '24 13:02 gneray