Chttpd request processing does not reset pdict predictably
Description
The Mochiweb http server uses a shared and re-used connection pool with chttpd:handle_request as the entry point for the Mochiweb request processing handoff.
The bug is that we don't start with a fresh process dictionary between requests, as the chttpd processes are reused so pdict values that are not expressly cleared will persist. Even worse, pdict values that are only set by specific code paths will persist until the next time an incoming request makes its way to this particular chttpd worker in the pool, at which point the old values will be updated.
An analogy would be if we used paper receipt to log each request, but instead of starting with a fresh form every time, we re-use the old form but only erase the old values when we need to modify them, so any other form entries would persist from whatever prior request filled them.
We also don't set the nonce value until https://github.com/apache/couchdb/blob/4e0b66a4f8ad78388d9520306608cab3ef7581b7/src/chttpd/src/chttpd.erl#L317, nearly 100 lines of code into the request handling, which means that any runtime errors happening prior to the setting of the nonce value will re-use the previously generated nonce value and return that as the "unique" value of the request, and will continue to return the existing value until a request is processed that reaches that line of code to generate a new nonce.
I propose we clear the pdict between requests. I don't think it's safe to do erlang:erase() as that'll also clear out the Mochiweb pdict values from https://github.com/apache/couchdb-mochiweb/blob/main/src/mochiweb_request.erl#L64-L78 so instead I propose a filtered clearing of the pdict values, like so:
diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl
index ab8e1e9a3..ca55a8ac8 100644
--- a/src/chttpd/src/chttpd.erl
+++ b/src/chttpd/src/chttpd.erl
@@ -221,6 +221,7 @@ stop() ->
mochiweb_http:stop(?MODULE).
handle_request(MochiReq0) ->
+ couch_util:clear_pdict(), %% Make sure we start clean, everytime
erlang:put(?REWRITE_COUNT, 0),
MochiReq = couch_httpd_vhost:dispatch_host(MochiReq0),
handle_request_int(MochiReq).
diff --git a/src/couch/src/couch_util.erl b/src/couch/src/couch_util.erl
index 739df28e5..0bc9a94ba 100644
--- a/src/couch/src/couch_util.erl
+++ b/src/couch/src/couch_util.erl
@@ -46,6 +46,7 @@
-export([verify_hash_names/2]).
-export([get_config_hash_algorithms/0]).
-export([remove_sensitive_data/1]).
+-export([clear_pdict/0, clear_pdict/1]).
-include_lib("couch/include/couch_db.hrl").
@@ -870,3 +871,31 @@ remove_sensitive_data(KVList) ->
KVList1 = lists:keyreplace(<<"password">>, 1, KVList, {<<"password">>, <<"****">>}),
% some KVList entries are atoms, so test fo this too
lists:keyreplace(password, 1, KVList1, {password, <<"****">>}).
+
+-spec clear_pdict() -> ok.
+clear_pdict() ->
+ clear_pdict(erlang:get()).
+
+%% Exclude mochiweb markers, otherwise just use erlang:erase/0
+-spec clear_pdict(list()) -> ok.
+clear_pdict([]) ->
+ ok;
+clear_pdict([{mochiweb_request_body, _V} | Rest]) ->
+ clear_pdict(Rest);
+clear_pdict([{mochiweb_request_body_length, _V} | Rest]) ->
+ clear_pdict(Rest);
+clear_pdict([{mochiweb_request_cookie, _V} | Rest]) ->
+ clear_pdict(Rest);
+clear_pdict([{mochiweb_request_force_close, _V} | Rest]) ->
+ clear_pdict(Rest);
+clear_pdict([{mochiweb_request_path, _V} | Rest]) ->
+ clear_pdict(Rest);
+clear_pdict([{mochiweb_request_post, _V} | Rest]) ->
+ clear_pdict(Rest);
+clear_pdict([{mochiweb_request_qs, _V} | Rest]) ->
+ clear_pdict(Rest);
+clear_pdict([{mochiweb_request_recv, _V} | Rest]) ->
+ clear_pdict(Rest);
+clear_pdict([{Key, _V} | Rest]) ->
+ erlang:erase(Key),
+ clear_pdict(Rest).
Where the skipped values are the pdict names from the linked Mochiweb code above. I'm not entirely sure about the legality nuances from copying in the Mochiweb names, so before I PR'ed that I figured it would be to hear back from @janl / @rnewson / etc on preferred approach here.
Alternatively, we could explicitly delete the values set but I really dislike how that results in any values falling through persisting for an unknown amount of time until it's cleared out by a similar request. So I proposed the "clear everything but Mochiweb entries" approach. Let me know what you think.
reusing an erlang process between requests is surprising to me but I can't find docs that reassure me that this was ever a promise from mochiweb.
however, if true, it does point out our folly in using the process dictionary so freely given the blanket advice not to do so.
I'd rather see us fix the real problem(s) instead of this PR. Or, perhaps, a patch for mochiweb to ensure a new process at the start of each request, even for keepalive connections.
chatted with davisp and he confirms the general problem too. 🤷
he did have a better suggestion for fix. 1) record the pdict keys at the start of our handle_request 2) delete any keys not in that set at the end of handle_request (in an after). And, yes, I think we'd have to store that list in the pdict...
noting that neither davisp or I think that a mochiweb erlang process is reused across sockets though. If that is happening we have a big problem, as that would be state/data passing between different clients.
Yeah I was surprised as well about the chttpd process reuse. I stumbled upon this because I thought it was not doing that lol. @nickva mentioned we probably use the same processes to keep direct access to the same socket allowing for connection re-use from load balancers, so I didn't think injecting spawning of a new process would be a clean solution here.
Hmmmm... what do you consider "better" about that suggestion? It avoids the licensing and bleed over of including the names of the Mochiweb pdict values, but I personally prefer an explicit whitelist of the fields we will allow, rather than a dynamic approach that assumes the found live pdict value is allowed.
We already hardcode mochiweb in chttpd.erl, so it's not like the use of Mochiweb is pluggable. An alternative option would be to patch into Mochiweb a function that exposes a list of atoms used in the pdicts, like mochiweb:pdict_values_used() or something and then exclude that list. We could then achieve similar results as my suggested diff with [erlang:erase(K) || K <- erlang:get() -- mochiweb:pdict_values_used()].
I'm not really attached to any particular approach, but I don't really like the dynamic nature of the alternative suggestion.
my point is that whatever is in the pdict before handle_request is called is, by definition, mochiweb's entries. removing anything other than those at the end should avoid any "leak" now and in the future. your static approach invites problems down the line if mochiweb adds to its set and we forget to update our hard-coded list.
The true fix is to remove all reliance on the process dict in our chttpd handling.
have you a full list of potential issues caused by this process reuse across requests?
stashing away the mochiweb pdict entries and clearing everything except those at the end makes sense.
we may also have some stray messages in the mailbox and could be worth checking if we return early to the client while there are still workers finishing any doc updates or something like that
stashing away the mochiweb pdict entries and clearing everything except those at the end makes sense.
I think this makes sense, but how do you assert those pdict entries are for mochiweb without having a list of all mochiweb pdict entries or doing something we should avoid like string_starts_with("mochi", atom_to_list(Key)).