Design config for authorization logic in backend
Currently, authorization logic is hard-coded in the backend (#76)
For now, we will implement a config design to make this logic configurable. The ultimate goal of the design is not to be fully flexible, but mostly to be fully specified and predictable.
In upstream helm charts, we plan to make this configuration required, but also to provide a default policy.
Proposed Design:
Currently the User Management section of the config looks like this:
UserManagement "KeycloakAuth" {
plugin_data {
jwksURL = "..."
redirectURL = "..."
}
}
I propose we add a section inside UserManagement.plugin_data that will define the auth logic:
UserManagement "KeycloakAuth" {
plugin_data {
jwksURL = "..."
redirectURL = "..."
auth_logic {
...
}
}
}
In this way, we ensure the policy defined is explicit, and we do not increase the number of config files required for Tornjak. This means we must fit with HCL format.
Usually with IAMs that provide the access tokens to be verified have a set of claims. Currently, the hard-coded auth logic checks a specific claim for the inclusion of a role. E.g., a JWT from Keycloak is parsed into this type where the field realm_access.roles is a list of roles.
For now, we will make the assumption that realm_access.roles is available and lists the roles of a user (this will fit in with future work involving implementing dex as alternative IAM to plug into - dex has a claimMapping option). The user should define a dictionary mapping roles to API calls, which is a many-to-many relation. There are two ways to do this: Role-to-API or API-to-Role mapping.
Option 1: Role-to-API Mapping
auth_logic "rbac" {
role "admin" {
allowed_commands = ["/api/healthcheck", "/api/agent/list", "/api/agent/ban"]
}
role "viewer" {
allowed_commands = ["/api/healthcheck", "/api/agent/list"]
}
}
In the above defined policy, a verified access token with the claim realm_access.roles=admin will be able to make the three API calls listed, viewer will be able to make the two API calls listed, and neither are able to make other API calls such as the /api/debugserver call.
Pros: easier to define a new role clearly Cons: have to list every API for every role
Option 2: API-to-Role Mapping
auth_logic "rbac" {
API "/api/healthcheck" {
allowed_roles = ["admin", "viewer"]
}
API "/api/agent/ban" {
allowed_roles = ["admin"]
}
}
In the above defined policy, access tokens with the claim admin are able to make both API calls above, and access tokens with the claim viewer is able to make only the /api/healthcheck call above. No one is able to make any other API calls.
Pros: easier to track access to each API Cons: harder to define each role
This is good, I prefer the API-to-Role mapping. This way we just need to add a section for the APIs to have a roles attachment as required. Roe-to-API is one way too but we would have to keep another section for this (which is an extra step we need to perform whenever a new API is added. Adding the auth logic to the already existing config makes sense too, instead of having an extra config file for it.
Okay! due to this comment and offline conversation, we will go with API-to-Role mapping with an additional constraint. We required defining the roles as well, so that an additional check can be made in each list of roles. E.g.,
authorization "rbac" {
role_list {
role "admin" {
desc = "The admin role allows full access. "
}
role "viewer" {
desc = "The viewer role has read-only access. "
}
}
auth_logic {
API "/api/healthcheck" {
allowed_roles = ["admin", "viewer"]
}
API "/api/agent/ban" {
allowed_roles = ["admin"]
}
}
}
Therefore, if there was a typo in the auth_logic such as:
API "/api/agent/ban" {
allowed_roles = ["admin1"]
}
Tornjak backend should not start and return the error of a bad config.
I really like your suggestion @maia-iyer . I also agree with @mamy-CS comments. Option 1 would generate a complexity and it would be error prone assuming we grow the API list. Regarding the startup evaluation and returning errors, I think we should consider following scenarios for testing:
- [ ] role listed, API referencing it --> OK
- [ ] role listed, no API referencing it --> OK, (good for testing)
- [ ] role not listed, API referencing it --> Error "Unknown Role (name) referencing the API (path name)"
- [ ] API with no role references --> That's interesting case. We have options: 1) we allow accessing the API without any role (publicly available), or 2.) we start the API but none can access it. It could be a good test case. Or the feature is available but disabled. We should discuss it.
- [ ] API referencing the path that does not have a corresponding service in Backend --> Error: Unknown API service path (path name)
I think that's all the cases. What do you think?
I agree with all the cases you defined.
For the fourth about API with no role reference,
- I definitely vote to not do 1. We should default to least privilege and allowing less. I would want a policy that allows all to be very explicit. So currently we can implement "allow all with bearer token" by manually including all defined roles.
- I don't think there is a functional difference between "start API with no access" and "disabled", but I suppose there is a visual error difference:
- Start API with no access leads to
403 Forbiddenerror. - Disabled leads to
404 not found
- Start API with no access leads to
I think I would prefer the Forbidden error for a couple reasons. I personally think this error is more expected and inline with the established pattern, and it gives less information about the configuration of the Tornjak backend.
This comment made me think of a couple other cases we haven't considered yet:
- Caller has no bearer token/is not authenticated
- Caller has a bearer token with no field
realm_access.roles
And particularly how to allow specific APIs to be called. I think right now we can ignore these cases and require the claims and the authentication. In the future we can extend the config sections for this logic