couchdb
couchdb copied to clipboard
Prevent users from modifying the special _replication fields
Overview
The _replication_*
fields are reserved for couchdb internal usage (indicated by the underscore prefix). Nonetheless the replicator docs are still editable by local admins. In order not to mislead them into thinking that editing these fields has any intentional effect, this PR prevents them from saving an update that changes any of these fields.
Testing recommendations
Try to change these fields as an admin (one without the reserved _replicator
role) and confirm it fails.
Related Issues or Pull Requests
N/A
Checklist
- [x] Code is written and works correctly
- [ ] Changes are covered by tests
- [ ] Any new configurable parameters are documented in
rel/overlay/etc/default.ini
- [ ] A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation
This is a nice usability improvement. One interesting corner case to consider is when users replicate the _replicator
database. Then, the updates would seem as if they are coming from the user and not the replicator and the new VDU update would reject those documents. Perhaps that is a cleaner way to handle it, anyway. But it would be an incompatibility.
If validate_doc_update
has a replicated_changes | new_edits
option passed in, we could allow an "replicated externally" exception for compatibility. After a cursory look I couldn't see if we pass that flag into the VDU callback.
I'm tempted to expand to any property beginning with _replication
in case we invent new ones. And should this be inside the if (!newDoc._deleted)
section?
A prefix check could make sense. We should probably synchronize it with https://github.com/apache/couchdb/blob/3.x/src/couch/src/couch_doc.erl#L346-L373 and have a prefix check there too?
Putting it under if (!newDoc._deleted)
would work.
But, overall, still not sure if this would break replication of the _replicator
dbs. Yeah it is a bit odd to replicate the _replicator db, but it might still be something users do. Those updates would look like user _bulk_doc
updates with new_edits:false
flag, and won't have the _replicator
role. I don't think we've explicitly considered what should happen in that case across all the combination matrix of old -> new state updates.
we have a general prohibition at https://github.com/apache/couchdb/blob/3.x/src/couch/src/couch_doc.erl#L377 for unknown special fields. I'm not sure it's wise to expand the couch_doc clauses to allow all _replication_
-prefixed fields but others can comment if they feel strongly.
replications of _replicator
db should be unaffected as the first clause allows any write by a user with the _replicator
role.
The PR is trying to prevent users from being misled if they attempt to influence the replicator in undocumented ways.
The _replicator
role isn't applied to replicated _bulk_docs
writes to the replication endpoints. In that case, the write is indistinguishable from any other user _bulk_docs
replicated POST request.
In other words, if user has backed up their replication job on $backup_sever/_replicator/repjob
then they replicate $backup_server/_replicator -> $server/_replicator
. The _replicator
role is not applied when the repjob
document is written. It is only applied when the replication app running on $server
has started running that job, and wants to do a state update from "triggered"
to "failed"
, for example.
So, this would be one difference in behavior that I am not 100% sure about. Previously users were able to replicate replicator docs in any state, but after this change those documents could be skipped over.
I made a script to demonstrate it https://gist.github.com/nickva/6ad4f7a284fe410d54da5affeb0e43de
It seems to me like there's a potential for a slippery slope into doing full JSON schema checking. I.e. in some context a confused admin may think replacing a property that is a string with an array or other type is meant to have an effect or that this secures against tampering with auditable history. This leads me to wonder if warnings in the fauxton's editor could be extended for more document-type specific checks and admins who edit directly should continue to do so at their own risk?
The _replicator
database should not have been user-accessible from the start. We should have put it behind an api endpoint with defined semantics. Too late to change it now, hence the approach in this PR. As for additional testing in fauxton, I'm skeptical. The best place to block unwanted writes is the VDU, which is enforced however the request is made.
If we think there's a risk that this change breaks existing legitimate uses (as @nickva is saying), it might be best not to merge this change.
Sure, I think I get this usage scenario. The distinction I'm thinking about is that:
- The current contents of the VDU are mandatory security, i.e. the user can't be allowed to add a role they don't have to the context through any mechanism.
- The intended additions are advisory, the user is achieving nothing by changing _replication_ values and it is nice to inform them (but possibly not correct to stop a programmatic update as @nickva points out.)
I wouldn't have thought of _replicator document fields as a first priority in advisory validation, but I think there could be much more advisory information to add to make couchdb admin friendlier for adding users, design docs, etc. The fauxton client side and schema validation could better present growing amounts of non-critical advice/warnings/info and keep it out the VDUs to make it clear that they are just advice, aren't really part of any security model and maybe don't block save, etc.
@lostnet yeah, it might be nice, in general, to have a schema we can export or publish from an endpoint so then Fauxton could use it to validate input.
Specifically for the replicator's VDU, an improvement could be to port the BDU (the Erlang VDU) approach from main
. There we have a single module which both validates and parses the replication job. That reduce duplication, so we don't validate it twice, and, as a bonus, it avoids having to auto-inject the VDU design doc.
https://github.com/apache/couchdb/blob/main/src/couch_replicator/src/couch_replicator_parse.erl
@rnewson One way we could allow replicated changes and disallow interactive edits is by using _revisions
field as a heuristic. VDU functions don't get the replicated_changes | interactive_edit
flag passed in, but in most cases, we could detect the replicated changes by the presence of a "_revisions" : {"start": N, "ids: [Rev1, Rev2, ...]}
field. A bit too hackish perhaps?
@lostnet There's no mechanism to send an "advisory" though, otherwise I'd agree. I also don't know what percentage of users edit documents through Fauxton but anything we do here would need to work for those that don't.
The underscore-prefixed fields are reserved for system use (lots of examples of this and is documented). There's enforcement against misuse for most of them because they stem from the erlang code itself. It's been tricky to extend that protection inside documents.
I am not a huge fan of my proposed change, and the motivating case is a user that got caught out by this. It is the first time in my experience of couchdb (circa 11 years) where someone has thought they could edit the _replication_state field to achieve an effect (cancellation, in this case).
@nickva A bit too cute, imo.
We've had a good discussion but perhaps it is best to do nothing.