couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

[WIP, but please review]: per-doc-access-control

Open janl opened this issue 1 year ago • 1 comments

This is a 2022 rebase of #3038 with many comments on that PR addressed.

This is still a WIP

Overview

This PR introduces the long-awaited first iteration of the per-doc-access-control feature (_access) for short.

The goal of this feature is to make the db-per-user pattern obsolete and allow mutually untrusting users access to the same database.

To recap, the downsides of the db-per-user pattern are:

  • additional management is needed to create, secure and remove database(s) corresponding to a user’s lifecycle. the peruser add-on helps with this, but it is not always sufficient.
  • in order to support the very common requirement of running a query across all per-user databases, all those databases need to be replicating into a central database (or other target). This adds significant CPU and IOPS resource usage to a CouchDB.
  • in large installations, most per-user databases will be very small (<100kb) leading to an unfavourable file-descriptor to data-accessed ratio.
  • additional tuning is needed to make compaction behave safely in this scenario.
  • there’s probably more…

An _access-enabled database has these properties.

  • each user can only read and write their own documents
  • each user gets _changes and _all_docs responses scoped to only their own documents
  • admins can still access all documents, and create views across all documents for analysis

In future iterations, the _access feature could allow documents to be owned by multiple users, and groups, plus differentiation of readonly and rewrite permissions for each, and per-user-views, but that’s TBD for the scope of this initial PR, however provisions have been made to make this possible.

Implementation Notes

The fundamental addition to the CouchDB API is threefold.

First, a database can be access-enabled at creation time. It is not possible to make an existing database access-capable. You can create a an access-enabled database with PUT /db?access=true (final option name to be bikeshet).

Second, the introduction of a new top-level document property _access: ["username"]. The current implementation requires non-admins to write docs that have this property. Docs without it are rejected (unless an admin writes them). The username in the first array element MUST match the user_ctx of any document CRUD request.

Third, for access-enabled databases, both _changes and _all_docs no include a switch:

  1. admin users go straight to the existing by-seq and by-id indexes, just as before.
  2. non-admin users are directed to a new internal view that includes two sections, one corresponding to by-seq and one to by-id, but with the username from the _access property as a prefix. These indexes then are queried with the user property from the request’s user_ctx as hardcoded startkey/endkey.

This new internal view is implemented by way of a new query server that is somewhat modelled after the mango query server (reusing couch_index/couch_mrview as much as possible).

The consequence here is that a non-access-enabled databases should behave no different than before in all aspects API and operational.

Each user also now gets a new internal role _user automatically appended to their list of roles to simplify access control setup. If a databases has this role in its _security object, it means: each authenticated user can access the database.

The replicator has been expanded to create replication checkpoints with an _access property as well. External replication clients like PouchDB will have to be updated accordingly.

I tried to make the PR easy to follow with logical commits to each section of CouchDB + some cleanup at the end. I suggest once satisfactory, this should be squashed into a single commit with an updated version of this PR text as the commit message.

The access feature can be globally disabled by server config.

This feature should work well with partitioned databases (in fact the combination should be a great benefit), but this has not yet been verified.

Implementation State

This PR comes with extensive tests covering all desired behaviour and it works and passes all tests.

There are a few cosmetic things to be discussed (all are already comments to this PR).

There is a performance regression in this PR that still has to be investigated, but it might be addressed by changing the current PR-behaviour in one detail. See this comment for more details about handling conflicted docs.

Future Work

Depending on how hard it is to do this correctly, we may or may not include expanding the _access property. It is designed to hold a list of users and groups that have access to a document. We will have to think through the consequences a bit more, but it might be that the only two things that are needed are:

  1. expand the access query server to index the whole _access property and add a row for the doc for each entry
  2. expand the doc CRUD validation to look at all entries in the whole _access property.

But if this proves to complicated, I’m okay with merging this PR with out multi-user and group support.

We might also want to add readonly and readwrite tags to _access entries for even more fine-grained access control.

This PR does not yet include “users can create views on their share of the database”, but I’d like to add this feature eventually.


There is a corresponding RFC that still needs updating and has comments that have not been addressed yet. My goal is to produce and up-to-date RFC by the time this PR is ready to merge.

Next Steps

  1. I’d like a wieder review of this to see if there are any obvious places that need addressing.
  2. I’d like to nail down the remaining design question about conflicted docs.
  3. I’d like to find the performance regression.

janl avatar Aug 06 '22 14:08 janl

Original comment

This is great, gratz!

One question: it’s not clear to me how /_security response will look for buckets having access restrictions, is there a special field for those restrictions? I mean I want to explicitly mark buckets of the kind in Photon, so how can I detect _access restricted buckets reading /_security endpoint?

RFC states admins can grant individual users and groups access to a database using the database’s _security object, no details.

https://github.com/apache/couchdb/pull/3038#issuecomment-665578748

janl avatar Aug 06 '22 14:08 janl

From the PR description:

Each user also now gets a new internal role _user automatically appended to their list of roles to simplify access control setup. If a databases has this role in its _security object, it means: each authenticated user can access the database.

This needs rethinking/clarifying: We still would want to allow two databases to have a non-overlapping set of users (as per _users) to access one but not the other database

janl avatar Nov 11 '22 11:11 janl

superseded by https://github.com/apache/couchdb/pull/4673 — will port remaining relevant comments over

janl avatar Jul 12 '23 15:07 janl