couchdb Per document access control

first draft implementation of https://github.com/apache/couchdb-documentation/pull/424 (which is itself a little out of date, but it paints the big picture).

Jul 12 '23 15:07 janl

Currently this PR has no description, and I wonder if you could add some high level documentation of how this works, or maybe link to an architectural document?

sorry for not making this explicit, but if you follow the superseded PRs backwards, you should find more commentary that I didn’t think was necessary to repeat here. The main design is outlined here: https://github.com/apache/couchdb-documentation/pull/424

Aug 07 '23 09:08 janl

I also noticed a few TODO and other comments that I'm guessing need to still be addressed or removed?

thanks and yes, these are all minor or already resolved, but not cleaned up yet. None of these should be major changes. I wanted to get a review on the general shape before getting everything into perfect shape.

Also thanks for the compiler warnings pass, I’ll get to those.

Aug 07 '23 09:08 janl

latest push fixes compiler warnings noted by Jay

Aug 07 '23 09:08 janl

I am coming back to couchdb updating an old application that has already implemented the above (per-document access, with linux-like permissions like owner/read/write access, even per-field protection on the doc). I personally don't like this feature and wouldn't like to see it merged:

This kind of security requirement is almost always specific to the application. In my case, for example, this solution is not fit for purpose.
Most of the time a front-end application server is better suited to handle security requirements, which might be complex.
I believe CouchDB should stick to being a database. We should refine the current codebase instead.
If we need to implement this kind of functionality (like in my case), we are better off providing a plugin/hook system where can enter the validate and read lifecycle of the document. Devs are free to implement what they want and we could provide a working example. (This is what I did). Providing a _custom_<field> on documents for devs to freely use would be great.
Are we going to filter the views? Are we going to filter the document counts on views? What about leaking information on views? Some views need some info, some others not? Will I sometimes use per-user view, sometimes the global views? And what about searching with Lucene? I feel we should not touch our indices.
Replicating documents into your database where the _access field changes, are we doing that? Will devs want a "master" where the _access field is allowed to change, and slaves where you won't? Is the ultimate purpose of this PR to segregate documents within your own DB? Then why not create per-user DBs?

I would be in favour of document lifecycle plugins/hooks, where we can implement this feature and many others.

Sep 07 '23 10:09 arturog

@arturog

I think secondary indexes would need to be updated to support this for it to have the desired effect (restricting a user's visibility of a database to a subset). I'm sure we've not done that work for dreyfus/clouseau (and I definitely haven't for nouveau).

I share your concern that this implementation might not meet everyone's needs, but that's not necessarily a blocker. It just needs to meet a significant number of people's needs.

I don't find "I believe CouchDB should stick to being a database." compelling. Other databases have richer permissions and access controls that we currently do, they're not off-topic.

The PR changes a number of internal records which prevents a smooth upgrade of existing CouchDB clusters and is blocked from merging until that is fixed imo, so there's time to think.

We currently have validate_doc_update where you can implement any kind of write control logic, but no equivalent for reads (or view queries besides the deprecated _list option). If Javascript were evaluated more efficiently (c.f, the quickjs embedding idea) we might add that.

Sep 07 '23 12:09 rnewson

We currently have validate_doc_update where you can implement any kind of write control logic, but no equivalent for reads (or view queries besides the deprecated _list option). If Javascript were evaluated more efficiently (c.f, the quickjs embedding idea) we might add that.

We implemented a fork of CouchDB with exactly that (a validate_doc_read), with code written in Erlang. We also added our own validate_doc_write in Erlang before the JS one kicks in. If extended, I believe it can provide not only this feature, but much more. Also, the issue of leaking out data in indices (specially to the lucene index) lead me to limit indexing to only public fields -- so the field-access was also born. Indices in CouchDB were kept untouched, and though all views had the full set of _ids, on document access (or requesting the full document via views), you would get <unauthorized>. Our implementation was not meant to present a "subset" of documents, but instead control access to the document.

Perhaps the requirement to present a subset of the database needs to be revisited? If this requirement were to be dropped, things might become easier.

Sep 07 '23 14:09 arturog

@arturog thank you for your comments, practical experience with variants of the same idea here are invaluable.

This kind of security requirement is almost always specific to the application. In my case, for example, this solution is not fit for purpose. Most of the time a front-end application server is better suited to handle security requirements, which might be complex.

I’m 100% on board with you here. That’s why this is designed as an opt-in feature. If you have a system on top of CouchSB that works for you, that will just continue to work.

To avoid confusion, let me restate the design goal of this PR: “make the db-per-user pattern obsolete”. It is specifically not “support arbitrary ACLs inside a database”.

With that out of the way, the scope of the potential work for a complete solution here is very big. In accordance with common wisdom: to built a complex system that works, one first has to build a simple system that works. The current PR reflects the simplest system that I could think of to get the minimal functionality working that satisfies the design goal.

To this end, per-user design docs (and all associated indexes of any form, JS, Mango, Search/Nouveau) are not supported. A future version of this PR however can use the new by-access-seq index to build per-user indexes, including reduces that are guaranteed to not leak any data. Global views are already supported but are limited to database admins.

Perhaps the requirement to present a subset of the database needs to be revisited? If this requirement were to be dropped, things might become easier.

Maybe you want a different feature? You are stating the explicit design goal of this PR that should be dropped.

Sep 12 '23 16:09 janl