CouchDB instance without search node crashes when calling a search request
Description
CouchDB crashes if I bombard it with search requests when search node does not exist. See this gist file to check the logs.
Steps to Reproduce
I could only reproduce the issue when I ran unit tests.
- Start a plain CouchDB instance that does not contain the
_searchfunctionality. I used the latest docker image. - Start the
CloudantDatabaseTestsintests/unit/database.tests.pyfile from thecloudant/python-cloudantrepo. I tried to skip as many tests as possible, so if you only keep these tests, you probably can reproduce the crash:
- https://github.com/cloudant/python-cloudant/blob/master/tests/unit/database_tests.py#L1081-L1121
- https://github.com/cloudant/python-cloudant/blob/master/tests/unit/database_tests.py#L1618-L1643
- https://github.com/cloudant/python-cloudant/blob/master/tests/unit/database_tests.py#L1645-L1677
Expected Behaviour
CouchDB should not crash.
Your Environment
- CouchDB version used: 3.1.1
- Browser name and version: not a browser, I tried it via
python-cloudant(master) - Operating system and version: macOS Big Sur, 11.3.1
Additional Context
You have to set the following env vars when starting the nosetest:
DB_USER={uname};DB_PASSWORD={pwd};DB_URL=http://127.0.0.1:5984;RUN_CLOUDANT_TESTS=false
All right, so the strange part is that I can reproduce this on couchdb:3 docker image with the following ddoc, but can't reproduce when I'm building from repo's 3.x branch (nor from tag 3.1.1) and changing dev/run's .ini configs to match docker's.
File: alpha.json
{
"indexes": {
"searchindex001": {
"index": "function(doc) { index(\"default\", doc._id); }"
}
}
}
Against docker:
$ curl -q -K .curlrc http://127.0.0.1:15984/koi -X PUT
{"ok":true}
$ curl -q -K .curlrc http://127.0.0.1:15984/koi -X POST -d '{"name": "Alice", "number": 42}'
{"ok":true,"id":"d9d564521ee139ec83134600f4000e0f","rev":"1-7b54ca479c9043f8cc4cd83777ce6b75"}
$ curl -q -K .curlrc http://127.0.0.1:15984/koi/_design/alpha -X PUT --data @alpha.json
{"ok":true,"id":"_design/alpha","rev":"1-b445a33cf17da10d2f2502f68f58462b"}
$ curl -q -K .curlrc http://127.0.0.1:15984/koi/_design/alpha/_search/searchindex001 -X POST -d '{"query": "name:Alice*"}'
{"error":"{badarg,[{erlang,monitor,[process,{main,'[email protected]'}],[]},\n {ioq,submit_request,2,[{file,\"src/ioq.erl\"},{line,187}]},\n {ioq,maybe_submit_request,1,[{file,\"src/ioq.erl\"},{line,150}]},\n {ioq,handle_info,2,[{file,\"src/ioq.erl\"},{line,123}]},\n {gen_server,try_dispatch,4,[{file,\"gen_server.erl\"},{line,616}]},\n {gen_server,handle_msg,6,[{file,\"gen_server.erl\"},{line,686}]},\n {proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,247}]}]}","reason":"{gen_server,call,\n [ioq,\n {request,<0.286.0>,{pread_iolist,4490},other,<0.435.0>,undefined},\n infinity]}","ref":2383229562}
Against repo:
$ curl -q -K .curlrc http://127.0.0.1:15984/koi -X PUT
{"ok":true}
$ curl -q -K .curlrc http://127.0.0.1:15984/koi -X POST -d '{"name": "Alice", "number": 42}'
{"ok":true,"id":"d9d564521ee139ec83134600f4000e0f","rev":"1-7b54ca479c9043f8cc4cd83777ce6b75"}
$ curl -q -K .curlrc http://127.0.0.1:15984/koi/_design/alpha -X PUT --data @alpha.json
{"ok":true,"id":"_design/alpha","rev":"1-b445a33cf17da10d2f2502f68f58462b"}
$ curl -q -K .curlrc http://127.0.0.1:15984/koi/_design/alpha/_search/searchindex001 -X POST -d '{"query": "name:Alice*"}'
{"error":"ou_est_clouseau","reason":"Could not connect to the Clouseau Java service at [email protected]"}
Since HEAD behaves as expected with ou_est_clouseau error, I suspect this is some kind of configuration issue, but I can't figure out values in ini config to reproduce it.
If anyone will manage to induce this locally in dev environment, please leave steps in a comment.
Intriguing! I think I figured it out. The difference is that the couchdb:3 container is configured without any node name and is not running in distributed mode. In that mode trying to monitor the remote Clouseau process triggers a badarg error:
# /opt/couchdb/erts-9.3.3.14/bin/erl -boot /opt/couchdb/releases/3.1.1/start_clean
Erlang/OTP 20 [erts-9.3.3.14] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V9.3.3.14 (abort with ^G)
1> erlang:monitor(process, {main, '[email protected]'}).
** exception error: bad argument
in function monitor/2
called as monitor(process,{main,'[email protected]'})
whereas in dev mode we are running a CouchDB Erlang node and the behavior changes to deliver a 'DOWN` message to the mailbox:
# /opt/couchdb/erts-9.3.3.14/bin/erl -boot /opt/couchdb/releases/3.1.1/start_clean -name [email protected]
Erlang/OTP 20 [erts-9.3.3.14] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V9.3.3.14 (abort with ^G)
([email protected])1> erlang:monitor(process, {main, '[email protected]'}).
#Ref<0.2127152124.2865758209.248364>
([email protected])2> receive M -> M end.
{'DOWN',#Ref<0.2127152124.2865758209.248364>,process,
{main,'[email protected]'},
noconnection}
([email protected])3>
Not sure offhand what the right fix is here. Seems like we ought to be able to run gracefully without the Erlang distribution, although it's news to me that the Docker image is configured that way.
From the Docker container README:
CouchDB also uses
/opt/couchdb/etc/vm.argsto store Erlang runtime-specific changes. Changing these values is less common. If you need to change the epmd port, for instance, you will want to bind mount this file as well. (Note: files cannot be bind-mounted on Windows hosts.)
and
NODENAME will set the name of the CouchDB node inside the container to couchdb@${NODENAME}, in the file /opt/couchdb/etc/vm.args. This is used for clustering purposes and can be ignored for single-node setups.
Try setting NODENAME?
Hi @wohali yes, I can confirm that starting the container with a NODENAME specified restores the intended behavior. It just seems to me that we'd want to be better behaved inside CouchDB itself when running in non-distributed mode, but I'm not sure about the best way to effect that change.
fixed by https://github.com/apache/couchdb/pull/4404