lxd icon indicating copy to clipboard operation
lxd copied to clipboard

Auth: Use entity type and ID for authorization checks

Open markylaing opened this issue 3 months ago • 2 comments

Please confirm

  • [x] I have searched existing issues to check if an issue already exists for my feature request.

Is your feature request related to a problem? Please describe.

We initially wrote the OpenFGA datastore to return tuples of the form:

{
   "user": "<entity type>:<entity URL>",
   "relation": "<relation>",
   "object": "<entity type>:<entity URL>"
}

e.g.

{
   "user": "group:/1.0/auth/groups/{name}#member",
   "relation": "can_view",
   "object": "instance:/1.0/instances/{name}?project={project_name}"
}

This was so that a permission check can be performed against a URL directly. E.g. if checking for access in /1.0/instances/{name}?project=default the caller wouldn't need to get the ID of the instance before performing the check.

This made it easier to retrofit fine-grained access control into LXD, but was shortsighted because the OpenFGA datastore spends a lot of time resolving entity types and IDs (from auth_groups_permissions) into URLs to return as tuples. This is likely causing fine-grained authorization to be slow.

Describe the solution you'd like

The OpenFGA datastore should return tuples of the form:

{
   "user": "<entity type>:<entity ID>",
   "relation": "entitlement",
   "object": "<entity type>:<entity ID>"
}

e.g.

{
   "user": "group:1#member",
   "relation": "can_view",
   "object": "instance:10"
}

The auth.Authorizer interface must be modified to account for the change, and all calls to the authorizer will require getting the entity ID before doing the access check. Note however that this should not slow LXD down, because we always need to load the resource from the database at some point anyway.

Describe alternatives you've considered

No response

Additional context

No response

markylaing avatar Sep 25 '25 14:09 markylaing

Preliminary work for this should change how entity references (used by URLs) are loaded from the database. We need to filter these by the permissions of the caller, so they will need to be of the form map[entity.Type][]int or similar.

markylaing avatar Sep 25 '25 14:09 markylaing

tl;dr

  • No simple solution
  • Will add structure/guidelines to our database access patterns in LXD. Then using entity type and ID will follow naturally.

Investigation

I've investigated a few different approaches and each run into one stumbling block or another. Below I'll try to summarise each of these approaches and why they don't work. This will help to rationalise the much larger refactor I will suggest later.

Attempt 1: Just start using entity type and ID

This naive approach refactored the Authorizer interface as below with no other changes:

type PermissionChecker func(entityID int) bool

type Authorizer interface {
	CheckPermission(ctx context.Context, entitlement Entitlement, entityType entity.Type, entityID int) error
	GetPermissionChecker(ctx context.Context, entitlement Entitlement, entityType entity.Type) (PermissionChecker, error)
        /* other methods unchanged */
}

This works nicely at most call sites and even simplifies many cases, but there are 2 problems:

  1. Collation of weak references ("used-by" URLs). All of these methods return a []string. We could change all of these methods to return something like map[entity.Type][]int as mentioned in the previous comment. However, this means we'll have to convert these references back to URLs again for the API response. Effectively querying the same database rows twice, which is what we are trying to avoid.
  2. The OpenFGA datastore implementation currently has optimisations in place for when it needs to know the parent of an entity. For example, when asking "Does user have entitlement can_view on profile:/1.0/profiles/foo?project=bar", the current datastore knows that the parent project of the "foo" profile is project:/1.0/projects/bar, because it forms part of the URL. After changing to use IDs instead of URLs the datastore needs to perform additional queries to ascertain the parent of an entity. This issue also applies to the TLS authorizer for restricted TLS clients.

Attempt 2: Encapsulate that auth entities have a parent

This was an attempt to solve the second issue from attempt 1 (ignoring the used-by issue). The idea was to endow each "entity" being checked with a parent, to match the authorization model.

type Entity struct {
    Type entity.Type
    ID int
    Parent *Entity
}

type PermissionChecker func(entityI Entity) bool

type Authorizer interface {
	CheckPermission(ctx context.Context, entitlement Entitlement, entity Entity) error
	GetPermissionChecker(ctx context.Context, entitlement Entitlement, entityType entity.Type) (PermissionChecker, error)
        /* other methods unchanged */
}

This required a LOT more work at all or most call sites and was awkward because we can't pass an "Entity" as an argument to the OpenFGA datastore. The datastore can never accept an argument that we control. It only accepts OpenFGA tuples as strings. Also, we still have no way of handling used-by URLs.

Attempt 3: Addressing used-by URLs checks

type Entity interface {
    Type() entity.Type
    DatabaseID() int
    URL() api.URL
    Parent() Entity
}

type PermissionChecker func(entity Entity) bool

type Authorizer interface {
	CheckPermission(ctx context.Context, entitlement Entitlement, entity Entity) error
	GetPermissionChecker(ctx context.Context, entitlement Entitlement, entityType entity.Type) (PermissionChecker, error)
        /* other methods unchanged */
}

This attempt also requires a lot of refactoring effort. However, enforcing that an auth.Entity can report it's own URL is very useful for used-by filtering and reporting.

The wider problem

With both attempts #2 and #3 there is another problem. The GetPermissionChecker function gets a list of all URLs of the given entity type. This performs a SQL query for the entity type. All API handlers that call the GetPermissionChecker function are subsequently querying the same table and data to get a list of entities, and then filtering with the permission checker.

There are actually many places in LXD where we're querying the same tables over and over again. I've cherry-picked a case study here but there are examples of this all over the code base. Lets say I have two projects, "foo" and "bar", and I call GET /1.0/projects?recursion=1, LXD will perform the following queries:

  1. To get a fine-grained permission checker:
    1. SELECT auth_groups.name, auth_groups_permissions.entity_id, auth_groups_permissions.entity_type, auth_groups_permissions.entitlement FROM auth_groups JOIN auth_groups_permissions ON auth_groups_permissions.auth_group_id = auth_groups.id
    2. SELECT 3, projects.id, projects.name, '', json_array(projects.name) FROM projects.
  2. To list the projects: 3. SELECT projects.id, projects.description, projects.name FROM projects ORDER BY projects.name 4. SELECT projects.id FROM projects WHERE projects.name = ? (for project foo) 5. SELECT projects_config.key, projects_config.value FROM projects_config WHERE projects_config.project_id = ? (for project foo configuration) 6. SELECT 5, instances.id, projects.name, '', json_array(instances.name) FROM instances JOIN projects ON instances.project_id = projects.id WHERE projects.name = ? UNION SELECT 2, profiles.id, projects.name, '', json_array(profiles.name) FROM profiles JOIN projects ON profiles.project_id = projects.id WHERE projects.name = ? UNION SELECT 1, images.id, projects.name, '', json_array(images.fingerprint) FROM images JOIN projects ON images.project_id = projects.id WHERE projects.name = ? UNION ... (and 6 more unions for project foo used by URLs) 7. SELECT projects.id FROM projects WHERE projects.name = ? (for project bar) 8. SELECT projects_config.key, projects_config.value FROM projects_config WHERE projects_config.project_id = ? (for project bar configuration) 9. SELECT 5, instances.id, projects.name, '', json_array(instances.name) FROM instances JOIN projects ON instances.project_id = projects.id WHERE projects.name = ? UNION SELECT 2, profiles.id, projects.name, '', json_array(profiles.name) FROM profiles JOIN projects ON profiles.project_id = projects.id WHERE projects.name = ? UNION SELECT 1, images.id, projects.name, '', json_array(images.fingerprint) FROM images JOIN projects ON images.project_id = projects.id WHERE projects.name = ? UNION ... (and 6 more unions for project bar used by URLs)
  3. To filter used-by URLs:
    1. SELECT 5, instances.id, projects.name, '', json_array(instances.name) FROM instances JOIN projects ON instances.project_id = projects.id (getting a permission checker for instances in project foo)
    2. SELECT 2, profiles.id, projects.name, '', json_array(profiles.name) FROM profiles JOIN projects ON profiles.project_id = projects.id (getting a permission checker for profiles in project foo) ... x. SELECT 5, instances.id, projects.name, '', json_array(instances.name) FROM instances JOIN projects ON instances.project_id = projects.id (getting a permission checker for instances in project bar) y. SELECT 2, profiles.id, projects.name, '', json_array(profiles.name) FROM profiles JOIN projects ON profiles.project_id = projects.id (getting a permission checker for profiles in project bar)

It is evident that there is a lot of redundancy here, both in the number of queries and the retrieved information.

Objective

Remove some bad patterns

One pattern that I would like to avoid in the future is any function signature that looks like Entity.ToAPI(ctx context.Context, tx *sql.Tx). This function signature is very convenient when getting a single resource. BUT it incurs a high cost when invoked in a loop. Typically, this pattern ties the number of database calls to the number of resources in the table. E.g. The number of queries executed when listing projects depends on the number of projects.

This can be avoided by querying tables separately and joining in go.

Move towards database access patterns that reduce redundancy

Typically, threads of execution in LXD follow the diagram below: Image

Each subsystem shown above has access to the Daemon State, and then access to the database. Almost all subsystems perform queries as required. Some data will be common to all subsystems (e.g. the current project ID) but very often, this data becomes unavailable after moving from one subsystem to another.

For example, in the device system, there is a function NICType (https://github.com/canonical/lxd/blob/ab7219b464eeb0f2d565058b67ca0e3eda3fcf5f/lxd/device/nictype/nictype.go#L20). This function calls into the database to get the devices' project and configuration. This should not be necessary. At this point, we should have already queried for the project to ensure it exists via basic validation and access checking.

Plan

Overview

  1. Add a read-through cache to the request/execution context. This cache should be used to access data from the database. Within the same execution context, we should know if we have changed any of this data ourselves, and what should be invalidated.
  2. Always read all data from a row. Additionally, join sufficient tables so that types in the cache represent API resources (entities) and can fulfil the auth.Entity interface. This will allow the OpenFGA Datastore to use the cache.
  3. Optimise for fewer queries overall. The number of queries for a given API route should never depend on the number of resources.

Tasks

  • [ ] Propagate context.Context to all subsystems.
    • [ ] ACL
    • [ ] Network
    • [ ] Device
    • [ ] Storage
    • [ ] Instance
    • [ ] DNS
    • [ ] Operations (need to make request context contents available after request context may have cancelled).
    • [ ] ...more
  • [ ] Add a struct to the request context that will be used for database queries (reads only)
  • [ ] Implement methods for all entity types, then refactor API handlers to use the new methods.
    • [ ] Instance
    • [ ] Image
    • [ ] Profile
    • [ ] Network
    • [ ] Network ACL
    • [ ] Network Zone
    • [ ] Storage pool
    • [ ] Storage volume
    • [ ] Storage bucket
    • [ ] Backups and Snapshots (Instances and Volumes)
    • [ ] Projects
    • [ ] Identities
    • [ ] Auth Groups
    • [ ] IdP groups
  • [ ] Update the OpenFGA datastore to use entity IDs
  • [ ] Update Authorizers to use entity IDs

markylaing avatar Nov 18 '25 14:11 markylaing