opteryx
opteryx copied to clipboard
✨ permissions model
Implement a user permissions model, the initial idea is to define permissions restrictions on the datasets and pass in a user model to the query. This is not RBAC, it is context-aware with role being a data-point in the context.
The permissions on the datasets are to be defined like this (YAML version, other forms should be acceptable):
access_requires:
- user.role: 'analyst'
user.location: field.location
- user.role: 'security analyst'
Interpreted as:
Access requires: The user to have an 'analyst' role AND the user's location to match the field 'location' in the data. OR The user to have a 'security analyst' role.
This would be permissive when multiple rules apply (e.g. the user has both roles in the example).
literal strings in quotes, fields refer to values in the dataset, literal numbers and boolean would not be in quotes, literal dates would be in quotes. Other qualifiers could be added if needed, e.g. 'dataset' as a collection of attributes about the dataset.
Permissions without 'field' conditions would be dataset-level, and permissions with 'field' conditions would be row-level. Row-level permissions would be expected to be slower as additional checks are happening per row in the dataset.
The user model is a dictionary (or pydantic model) that describes the user - no fixed attributes but missing attributes evaluate to 'None' which won't match rules.
The user model, the data attributes and the access definition are then used to filter the access the user has.
These could be used as selection pushdowns using pyarrow Expressions, so the data never gets to be processed by Opteryx proper.
A default set of permissions should be able to be defined - e.g. to allow access to an admin group but not analysts without needing to specify this.
The Engine would expect to have the permissions and user information pushed to it, rather than go fetch, e.g.
import opteryx
conn = opteryx.connect(user=user_model, permissions=permissions)
...
See also #329, user should return this identity.
Further Reading
- https://medium.com/nerd-for-tech/data-security-is-a-data-asset-problem-not-a-data-schema-one-a1287931dd02