feature/rbac-2
New Features
- Introduced Role-Based Access Control (RBAC) to indexd.
- Added support for enforcing authorization on database operations via Arborist (
adds RBAC to db operations). - Integrated a cached call to Arborist to reduce authorization lookup overhead (
add cached call to Arborist). - Added configuration flag to enable or disable RBAC enforcement (
Adds RBAC config).
- Added support for enforcing authorization on database operations via Arborist (
Breaking Changes
- None. The RBAC feature is gated by a configuration flag and maintains backward compatibility when disabled.
Bug Fixes
- Improved error messaging for unauthorized access and token validation (
Improve error handling).
Improvements
- Added developer documentation for RBAC configuration and usage (
developer documentation). - Added test coverage for RBAC behavior (
test RBAC).
Dependency updates
- None.
Deployment changes
- New optional configuration setting:
RBAC. When set toTrue, RBAC enforcement is active for protected records. - Arborist must be reachable by the indexd service for RBAC to function properly.
Not yet addressed:
- [x] Ensure ARE_RECORDS_DISCOVERABLE, GLOBAL_DISCOVERY_AUTHZ See discussion
- [ ] Add a corresponding feature flag to helm chart
@Avantol13 - could you review and comment
🧾 User Story 1: Control Whether Records Are Discoverable
Title: Configurable Discovery of Indexd Records
As a platform operator, I want to control whether indexd records are discoverable at all via a config flag, So that I can prevent users from listing or retrieving records unless explicitly permitted.
Acceptance Criteria
- [x] Given
ARE_RECORDS_DISCOVERABLE=False, when a client sends a request (with valid token):- indexd returns a
403 Forbiddenfor all reads with id e.g.GET /index/<did>- should only 403 in situations where this record itself would've been filtered out.
- indexd filters out all records without read permission e.g.
GET /index
- indexd returns a
- [x] Given
ARE_RECORDS_DISCOVERABLE=True, the RBAC rules are ignored - [x] This behavior is documented in
indexd_settings.pyand the README with a description of impact on runtime behavior.
🧾 User Story 2: Global Discovery Authorization Control
Title: Global Discovery Authz for Indexd Records, Support Discovery Access Independent from File Access
As a system administrator, I want to configure a global authorization group for reading/discovering indexd records, So that discovery can be gated separately from file access and we can support user registration workflows.
As a data commons architect, I want to decouple discovery access (e.g. listing/searching records) from access to the underlying files, So that I can implement workflows like "register to see what’s available", then "apply for access to download".
Acceptance Criteria
Assuming ARE_RECORDS_DISCOVERABLE=False
- [x] If
GLOBAL_DISCOVERY_AUTHZ=None, then RBAC will use record-levelauthzfields are used to authorizeGETrequests to records. ie then record-levelauthzcontinues to govern access to records. - [x] If
GLOBAL_DISCOVERY_AUTHZ is setand if a user has permissions to the resource set in GLOBAL_DISCOVERY_AUTHZ, then RBAC will ignore filters for record-levelauthzfields and return all records. - [x] Behavior is clearly documented, including the override effect of
GLOBAL_DISCOVERY_AUTHZ.
📌 Configuration Summary
# Whether any records are discoverable at all
ARE_RECORDS_DISCOVERABLE = True # default: True
# Override per-record authz for GET/read
# Only applies to record discovery (not file access)
# If None, use per-record `authz`
GLOBAL_DISCOVERY_AUTHZ = ["/indexd/discovery"]
In general the comments above look good, thanks for all the detail.
This part:
indexd returns a 403 Forbidden for all reads with id e.g. GET /index/
I think needs to actually behave similar to READ filtering based on config. In other words, if you request a did and you do have access to authz, this should return 200. If you request a did and do you have access to the global authz that's configured, this should return 200. Basically this should only 403 in situations where this record itself would've been filtered out.
In general the comments above look good, thanks for all the detail.
This part:
indexd returns a 403 Forbidden for all reads with id e.g. GET /index/
I think needs to actually behave similar to READ filtering based on config. In other words, if you request a did and you do have access to authz, this should return 200. If you request a did and do you have access to the global authz that's configured, this should return 200. Basically this should only 403 in situations where this record itself would've been filtered out.
Thanks. I edited the comment above
- [x] squash commits
@Avantol13 I've addressed all PR items. Please see https://github.com/uc-cdis/indexd/pull/405#discussion_r2369327829 for a followup question.
Using ContextVar to decouple the web tier from the database driver
See https://github.com/ohsu-comp-bio/indexd/blob/672f16b469298d27bb0782627ad1df05ebd65efb/indexd/auth/discovery_context.py
Why this pattern
Current request-scoped values (Bearer token, Arborist resources) are collected in the web tier but consumed in the data tier (for row-level security, audit columns, tenancy filters, per-request logging, etc.). Passing these values explicitly through every call (blueprints → services → repositories) scatters boilerplate and causes wide signature churn.
contextvars.ContextVar provides implicit, request-scoped propagation without coupling the DB code to Flask. It gives you:
Separation of concerns: web code gathers state once; DB/ORM code reads it when needed.
-
Zero signature churn: no need to thread updated parameters through every function.
-
Async/greenlet safety: unlike threading.local() and many ad-hoc globals, ContextVar is designed for async tasks and framework context switches.
-
Testability: tests can set/reset context in one place without building fake requests.
-
Framework-agnostic DB layer: the driver/ORM code depends only on ContextVar, not on flask.request, flask.g, or app globals.
Clarification question:
How will a web tier decorator will retrieve the record's authz? There are several possible database calls, each with their own set of possible parameters. Unclear how to a decorator would manage this. Assuming it could, wouldn't that double the db IO count for successful queries (once in the decorator + once in the driver)?
from above
Clarification question:
How will a web tier decorator will retrieve the
record's authz? There are several possible database calls, each with their own set of possible parameters. Unclear how to a decorator would manage this. Assuming it could, wouldn't that double the db IO count for successful queries (once in the decorator + once in the driver)?from above
![]()
The decorator would need to use the driver to interact as the data access layer (DAL) based on the content of the request. e.g. take GUID, ask DAL for record, get GUID's authz from record. In the backwards-compatible case (where discovery is open) then all this logic should get skipped. In the new case, this does mean that you db hit to get the authz.
I'm open to reconsidering the decorator as the approach (this was just a cursory idea) so long as we figure out a way that the separation b/t web / DAL exists appropriately, but note that the decorator is just a function wrapping the decorated function, so you can pass in variables. e.g. theoretically you could get the index record from the db in the decorator logic (to get authz) and then pass the whole record into the endpoint function to avoid a double db hit (and in the case where we don't use this feature, just pass none and then hit the db if you don't have the record when you need it).
API Endpoints Documentation
A preliminary list of blueprint endpoints that are impacted by discovery. Ordered by db driver and method.
indexd/index/drivers/alchemy.py
-
ids() - Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/
- Method: POST
- URL: /index/records - Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/records - Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/objects
-
get_urls()
- Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/urls
- Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/urls
- Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/urls
- Blueprint: indexd.blueprint.indexd
-
get_all_versions()
- Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/versions/<record_id>
- Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/versions/<record_id>
- Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/objects/<record_id>/versions
- Blueprint: indexd.blueprint.indexd
-
get_latest_version()
-
Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/latest_version/<record_id>
-
Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/latest_version/<record_id>
-
Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/objects/<record_id>/latest_version
-
-
get()
-
Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/<record_id>
-
Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/<record_id>
-
Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/objects/<record_id>
-
-
get_with_nonstrict_prefix()
-
Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/prefix/
-
Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/prefix/
-
Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/objects/prefix/
-
-
get_by_alias()
-
Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/alias/
-
Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/alias/
-
Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/objects/alias/
-
-
get_aliases_for_did()
-
Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/aliases/
-
Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/aliases/
-
Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/objects/aliases/
-
-
get_all_versions()
-
Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/versions/<record_id>
-
Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/versions/<record_id>
-
Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/objects/<record_id>/versions
-
indexd/index/drivers/query/urls.py:
-
query_urls()
-
Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/urls
-
Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/urls
-
Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/urls
-
-
query_metadata_by_key()
-
Blueprint: indexd.blueprint.indexd
- Method: GET
- URL: /index/metadata/query
-
Blueprint: indexd.blueprint.admin
- Method: GET
- URL: /admin/index/metadata/query
-
Blueprint: indexd.blueprint.drs
- Method: GET
- URL: /ga4gh/drs/v1/metadata/query
-
sequenceDiagram
autonumber
actor Client
participant Flask as Flask App
participant Hook as "before_request: ensure_auth_context()"
participant Arborist as "Arborist (auth service)"
participant Ctx as "ContextVar<request_ctx>"
participant Repo as "Driver.method()"
participant Decorator as "@authorize_discovery"
participant DB as "Indexd DB"
Client->>Flask: HTTP GET /ids …
note over Flask,Hook: middleware, called before all blueprints
Flask->>Hook: Trigger before_request
Hook->>Flask: Read config\nARE_RECORDS_DISCOVERABLE, GLOBAL_DISCOVERY_AUTHZ
alt are_records_discoverable == false
Hook->>Arborist: auth.get_authorized_resources()
Arborist-->>Hook: {resource -> [permissions]}
Hook->>Hook: filter for read@indexd|*\ncompute can_user_discover
else are_records_discoverable == true
Hook->>Hook: can_user_discover = true\nauthorized_resources = []
end
Hook->>Ctx: request_ctx.set({\n are_records_discoverable,\n can_user_discover,\n authorized_resources\n})
note over Flask: call specific endpoint e.g. `/index`
Flask->>Decorator: call Driver.ids(...)
note over Decorator,Repo: Inject can_user_discover, authorized_resources only if args missing/None
Decorator->>Ctx: get_auth_context()
Decorator->>Repo: ids(...,\n can_user_discover, authorized_resources)
Repo->>DB: query with discovery rules\n(ACL/authz filters)
DB-->>Repo: results
Repo-->>Flask: IDs payload
Flask-->>Client: 200 OK (JSON)
note over Ctx: Framework-agnostic state\n(no Flask import in Driver)
Request hits Flask → ensure_auth_context seeds a ContextVar with discovery flags. The @authorize_discovery decorator then pulls can_user_discover and authorized_resources and injects them into wrapped function, for example ids() , so the repo/DB layer stays Flask-agnostic and only consumes plain function parameters.
Thanks for the thoughtful review, @Avantol13 I’ve aligned the behavior and added a small infrastructure layer to make it reliable and testable.
What changed (referencing commit 2064504)
-
Discovery context + decorator. Introduced a
ContextVar-backeddiscovery_contextand an@authorize_discoverydecorator that injectscan_user_discoverandauthorized_resourcesinto the driver entry points. This keeps the repo/DB code Flask-agnostic and avoids depending onflask.requestwhile still giving us request-scoped auth state. See commit “adds authorize_discovery decorator” (206450476c81bec42cff222702093c2a3583eb50). ([GitHub]) -
Plain-parameter injection. The affected driver functions (e.g.,
ids(...)see above) were updated to accept plain old parameterscan_user_discoverandauthorized_resourceswhich the decorator injects when not explicitly provided by the caller. -
This commit includes
indexd/auth/discovery_context.pyand related wiring plus unit tests to verify the injection semantics. ([GitHub])
Why this addresses the concern
The updated logic and the discovery context make that true for both list/search and direct-DID paths, while keeping the enforcement code decoupled from Flask and easy to unit test. ([GitHub])
Impact on blueprints & compatibility
-
No blueprint rework required. The change is wired at the app edge (before_request seeding the context) and at the repository boundary via the decorator. Blueprint routes/endpoints remain untouched.
-
Preserves existing functionality. This is additive and non-breaking; no API or schema changes.
-
Existing tests continue to pass. The current test suite runs without modification; new unit tests cover the decorator/context behavior.
Follow-ups
- I’ll open a small PR in gen3-helm to surface
ARE_RECORDS_DISCOVERABLEandGLOBAL_DISCOVERY_AUTHZas explicit Helm values and update docs accordingly (not done yet; tracked as a follow-up from the earlier discussion). ([GitHub][2])
If anything looks off in the commit, I’m happy to tweak. Thanks again for the clear guidance, this made the behavior much more predictable.
Summary of the changes made in commit 0d7f5d706334ff9b5023f4bce39dfb3b7e7b7afd :
A single change was made to a single blueprint method,get_all_index_record_versions, to verify the desired changes.
If viable, this pattern would be repeated in all the blueprints listed above and deprecated methods (ensure_auth_context, authorize_discovery) removed.
These changes refactor how authorization context is handled and propagated. Instead of relying on global or context-setting side effects, the relevant authorization variables are now explicitly returned and passed through function calls, improving code clarity and maintainability. The changes also update method signatures to accept authorization parameters, and remove an authorization decorator in favor of direct parameter handling.
1. Auth Context Refactor
-
Introduced a new function:
auth_context()inindexd/auth/discovery_context.py- Returns a tuple:
(are_records_discoverable, can_user_discover, authorized_resources)instead of directly setting context. - The previous logic in
ensure_auth_context()was split—nowensure_auth_context()callsauth_context()and sets context using its return values.
- Returns a tuple:
2. Blueprint Updates
-
In
indexd/index/blueprint.py:- Imports and uses the new
auth_context()function. - Updates the
get_all_index_record_versionsview to callauth_context()and passcan_user_discoverandauthorized_resourcesto the index driver’sget_all_versions()method.
- Imports and uses the new
3. Index Driver Changes
-
In
indexd/index/drivers/alchemy.py:- The
get_all_versionsmethod signature now explicitly acceptscan_user_discoverandauthorized_resourcesparameters (defaulting toTrueandNone). - Removes the
@authorize_discoverydecorator fromget_all_versions. - Updates
_enforce_record_authzto check forauthorized_resourcesbefore enforcing authorization.
- The
-
In
indexd/index/drivers/single_table_alchemy.py:- Updates
get_all_versionsmethod to acceptcan_user_discoverandauthorized_resourcesparameters, to comply with the changed method signature. Note, this is the first change to this module, as it was not impacted by other design iterations above
- Updates
Hey @bwalsh, per
If viable, this pattern would be repeated in all the blueprints listed above and deprecated methods (ensure_auth_context, authorize_discovery) removed.
Could you go ahead and make all those necessary changes so we can see the overall new approach without it being mixed in with the old approach? We'll have the git history if we need to refer back or revert, but it's hard to understand the implications without making the full change.