couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

no match of right hand value {error,enospc}

Open SourceR85 opened this issue 1 year ago • 8 comments

Description

I've set up a fresh CouchDB 3.4.1 instance (as Docker image, build from https://github.com/apache/couchdb-docker/tree/main/3.4.1) Then I've started a replication from prod.-server and saw endless messages of "no match of right hand value {error,enospc}"

Here a (truncated) copy of the docker log: couchdb.tar.gz

Your Environment

  • CouchDB version used: version: 3.4.1
    "git_sha": "f504e38a5",
    "features": [
        "nouveau",
        "access-ready",
        "partitioned",
        "pluggable-storage-engines",
        "reshard",
        "scheduler"
    ]
  • Operating system and version: Fedora Linux 40 (KDE Plasma) and Ubuntu Server 24 (Both running the same docker image and report the same error)

Additional Context

Docker Engine
 Version:    27.3.1
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15

I've talked a bit with Jan at slack, his first thoughts: https://app.slack.com/client/T49P1AZRT/C49LEE7NW

SourceR85 avatar Sep 30 '24 13:09 SourceR85

enospc from no match of right hand value {error,enospc} indicates we're probably running out of disk space [1]

It should be a more friendly message in the log, but at least first sight that's what's jumping out.

[1] https://www.man7.org/linux/man-pages/man3/errno.3.html

nickva avatar Sep 30 '24 14:09 nickva

enospc from no match of right hand value {error,enospc} indicates we're probably running out of disk space

~That's not a problem...~ I have 799.7 GB of 2TB free (the DB I replicate is 86.1GB)

SourceR85 avatar Sep 30 '24 14:09 SourceR85

Is there any chance view directory is configured to write another disk or the disks may fail to mount and it ends up writting to the root file system. enospc is usually a transparent passthrough error from the FS layer.

The first instance in the logs seem to come from writting an attachments:

gen,do_call,4,[{file,"gen.erl"},{line,237}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,381}]},
{couch_att,write_streamed_attachment,3,

Is there a way to reconfigure the data directory or point it to another volume? Or tests if you can write to it manually? Verify that indeed the data directory is pointing the mounted large volume, sometimes misconfigurations happen and I've seen writes going to another directory than the indentded one.

nickva avatar Sep 30 '24 14:09 nickva

As you expect: the docker volume got stuck... Can't write content into data (just touch file works)

This is my docker deployment (secrets removed) couchdb.tar.gz There's nothing fancy in it, as far as I can say...

SourceR85 avatar Sep 30 '24 15:09 SourceR85

Can't write content into data (just touch file works)

That would explain it, I think. Good find. It's sneaky that touch works though.

nickva avatar Sep 30 '24 16:09 nickva

Just for curiosity, I stopped the container, rm & created couchdb-data and started the replication again: same result...

[notice] 2024-09-30T16:14:19.553744Z nonode@nohost <0.14636.101> -------- Retrying POST request to http://localhost:5984/hzd/_bulk_docs in 4.0 seconds due to error {code,500}
[error] 2024-09-30T16:14:19.574327Z nonode@nohost <0.16657.101> d5dfe20e02 rexi_server: from: nonode@nohost(<0.19120.101>) mfa: fabric_rpc:update_docs/3 exit:{{badmatch,{error,enospc}},[{couch_bt_engine,write_doc_body,2,[{file,"src/couch_bt_engine.erl"},{line,439}]},{couch_db_updater,'-flush_trees/3-fun-0-',6,[{file,"src/couch_db_updater.erl"},{line,384}]},{couch_key_tree,mapfold_simple,4,[{file,"src/couch_key_tree.erl"},{line,464}]},{couch_key_tree,mapfold_simple,4,[{file,"src/couch_key_tree.erl"},{line,473}]},{couch_key_tree,mapfold,3,[{file,"src/couch_key_tree.erl"},{line,457}]},{couch_db_updater,flush_trees,3,[{file,"src/couch_db_updater.erl"},{line,373}]},{couch_db_updater,update_docs_int,4,[{file,"src/couch_db_updater.erl"},{line,718}]},{couch_db_updater,handle_info,2,[{file,"src/couch_db_updater.erl"},{line,183}]}]} [{couch_db,collect_results,3,[{file,"src/couch_db.erl"},{line,1457}]},{couch_db,collect_results_with_metrics,3,[{file,"src/couch_db.erl"},{line,1439}]},{couch_db,write_and_commit,4,[{file,"src/couch_db.erl"},{line,1471}]},{couch_db,update_docs,4,[{file,"src/couch_db.erl"},{line,1333}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,360}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,141}]}]
[info] 2024-09-30T16:14:19.574423Z nonode@nohost <0.243.0> -------- db shards/e0000000-ffffffff/hzd.1727710380 died with reason {{badmatch,{error,enospc}},[{couch_bt_engine,write_doc_body,2,[{file,"src/couch_bt_engine.erl"},{line,439}]},{couch_db_updater,'-flush_trees/3-fun-0-',6,[{file,"src/couch_db_updater.erl"},{line,384}]},{couch_key_tree,mapfold_simple,4,[{file,"src/couch_key_tree.erl"},{line,464}]},{couch_key_tree,mapfold_simple,4,[{file,"src/couch_key_tree.erl"},{line,473}]},{couch_key_tree,mapfold,3,[{file,"src/couch_key_tree.erl"},{line,457}]},{couch_db_updater,flush_trees,3,[{file,"src/couch_db_updater.erl"},{line,373}]},{couch_db_updater,update_docs_int,4,[{file,"src/couch_db_updater.erl"},{line,718}]},{couch_db_updater,handle_info,2,[{file,"src/couch_db_updater.erl"},{line,183}]}]}
[error] 2024-09-30T16:14:19.574887Z nonode@nohost <0.18010.101> -------- gen_server <0.18010.101> terminated with reason: no match of right hand value {error,enospc} at couch_bt_engine:write_doc_body/2(line:439) <= couch_db_updater:'-flush_trees/3-fun-0-'/6(line:384) <= couch_key_tree:mapfold_simple/4(line:464) <= couch_key_tree:mapfold_simple/4(line:473) <= couch_key_tree:mapfold/3(line:457) <= couch_db_updater:flush_trees/3(line:373) <= couch_db_updater:update_docs_int/4(line:718) <= couch_db_updater:handle_info/2(line:183)
  last msg: redacted
     state: {db,1,<<"shards/e0000000-ffffffff/hzd.1727710380">>,"./data/shards/e0000000-ffffffff/hzd.1727710380.couch",{couch_bt_engine,{st,"./data/shards/e0000000-ffffffff/hzd.1727710380.couch",<0.19406.101>,#Ref<0.3603940510.502005771.203208>,undefined,{db_header,8,30406,0,{9450247660,{29670,687,{size_info,9279630171,9278136634}},12600491},{9450249167,30357,11927090},{9448039553,[],2388},nil,nil,4251,1000,<<"2719778795232e78e860e5e8ab70c794">>,[{nonode@nohost,0}],0,1000,0},false,{btree,<0.19406.101>,{9450247660,{29670,687,{size_info,9279630171,9278136634}},12600491},fun couch_bt_engine:id_tree_split/1,fun couch_bt_engine:id_tree_join/2,undefined,fun couch_bt_engine:id_tree_reduce/2,snappy},{btree,<0.19406.101>,{9450249167,30357,11927090},fun couch_bt_engine:seq_tree_split/1,fun couch_bt_engine:seq_tree_join/2,undefined,fun couch_bt_engine:seq_tree_reduce/2,snappy},{btree,<0.19406.101>,{9448039553,[],2388},fun couch_bt_engine:local_tree_split/1,fun couch_bt_engine:local_tree_join/2,undefined,nil,snappy},snappy,{btree,<0.19406.101>,nil,fun couch_bt_engine:purge_tree_split/1,fun couch_bt_engine:purge_tree_join/2,undefined,fun couch_bt_engine:purge_tree_reduce/2,snappy},{btree,<0.19406.101>,nil,fun couch_bt_engine:purge_seq_tree_split/1,fun couch_bt_engine:purge_seq_tree_join/2,undefined,fun couch_bt_engine:purge_tree_reduce/2,snappy}}},<0.18010.101>,nil,30406,<<"1727712856444764">>,{user_ctx,null,[],undefined},[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}],[#Fun<couch_doc.7.91987333>],nil,nil,undefined,[{default_security_object,[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},replicated_changes,{user_ctx,{user_ctx,<<"groot">>,[<<"_admin">>],<<"cookie">>}},{w,"1"},{props,[{partitioned,true},{hash,[couch_partition,hash,[]]}]}],undefined}
    extra: []
[notice] 2024-09-30T16:14:19.574938Z nonode@nohost <0.19120.101> d5dfe20e02 localhost:5984 127.0.0.1 groot POST /hzd/_bulk_docs 500 ok 21
[error] 2024-09-30T16:14:19.575102Z nonode@nohost <0.18010.101> -------- gen_server <0.18010.101> terminated with reason: no match of right hand value {error,enospc} at couch_bt_engine:write_doc_body/2(line:439) <= couch_db_updater:'-flush_trees/3-fun-0-'/6(line:384) <= couch_key_tree:mapfold_simple/4(line:464) <= couch_key_tree:mapfold_simple/4(line:473) <= couch_key_tree:mapfold/3(line:457) <= couch_db_updater:flush_trees/3(line:373) <= couch_db_updater:update_docs_int/4(line:718) <= couch_db_updater:handle_info/2(line:183)
  last msg: redacted
     state: {db,1,<<"shards/e0000000-ffffffff/hzd.1727710380">>,"./data/shards/e0000000-ffffffff/hzd.1727710380.couch",{couch_bt_engine,{st,"./data/shards/e0000000-ffffffff/hzd.1727710380.couch",<0.19406.101>,#Ref<0.3603940510.502005771.203208>,undefined,{db_header,8,30406,0,{9450247660,{29670,687,{size_info,9279630171,9278136634}},12600491},{9450249167,30357,11927090},{9448039553,[],2388},nil,nil,4251,1000,<<"2719778795232e78e860e5e8ab70c794">>,[{nonode@nohost,0}],0,1000,0},false,{btree,<0.19406.101>,{9450247660,{29670,687,{size_info,9279630171,9278136634}},12600491},fun couch_bt_engine:id_tree_split/1,fun couch_bt_engine:id_tree_join/2,undefined,fun couch_bt_engine:id_tree_reduce/2,snappy},{btree,<0.19406.101>,{9450249167,30357,11927090},fun couch_bt_engine:seq_tree_split/1,fun couch_bt_engine:seq_tree_join/2,undefined,fun couch_bt_engine:seq_tree_reduce/2,snappy},{btree,<0.19406.101>,{9448039553,[],2388},fun couch_bt_engine:local_tree_split/1,fun couch_bt_engine:local_tree_join/2,undefined,nil,snappy},snappy,{btree,<0.19406.101>,nil,fun couch_bt_engine:purge_tree_split/1,fun couch_bt_engine:purge_tree_join/2,undefined,fun couch_bt_engine:purge_tree_reduce/2,snappy},{btree,<0.19406.101>,nil,fun couch_bt_engine:purge_seq_tree_split/1,fun couch_bt_engine:purge_seq_tree_join/2,undefined,fun couch_bt_engine:purge_tree_reduce/2,snappy}}},<0.18010.101>,nil,30406,<<"1727712856444764">>,{user_ctx,null,[],undefined},[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}],[#Fun<couch_doc.7.91987333>],nil,nil,undefined,[{default_security_object,[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},replicated_changes,{user_ctx,{user_ctx,<<"groot">>,[<<"_admin">>],<<"cookie">>}},{w,"1"},{props,[{partitioned,true},{hash,[couch_partition,hash,[]]}]}],undefined}
    extra: []
[error] 2024-09-30T16:14:19.575128Z nonode@nohost <0.14636.101> -------- Replicator, request POST to "http://localhost:5984/hzd/_bulk_docs" failed due to error {code,500}
[error] 2024-09-30T16:14:19.575198Z nonode@nohost <0.18010.101> -------- CRASH REPORT Process  (<0.18010.101>) with 0 neighbors crashed with reason: no match of right hand value {error,enospc} at couch_bt_engine:write_doc_body/2(line:439) <= couch_db_updater:'-flush_trees/3-fun-0-'/6(line:384) <= couch_key_tree:mapfold_simple/4(line:464) <= couch_key_tree:mapfold_simple/4(line:473)

grafik

SourceR85 avatar Sep 30 '24 16:09 SourceR85

My fault: I'm using Docker Desktop, the max. storage capacity was globally set to 100GB and the source (CouchDB 3.3.3) is running in parallel, so I can replicate from it... My assumption was, that I'm running docker without limits.

So nickva spotted it right on his first comment:

enospc from no match of right hand value {error,enospc} indicates we're probably running out of disk space [1]

It should be a more friendly message in the log, but at least first sight that's what's jumping out.

[1] https://www.man7.org/linux/man-pages/man3/errno.3.html

There may be two ideas for improvement, that I can provide from my fault:

  1. A more user friendly error message than {error,enospc}.
  2. Quit CouchDB on that error (since health-checks run fine, as long as the endpoints are reachable) or report an unhealthy status in _up endpoint (507 Insufficient Storage may fit for this purpose).

SourceR85 avatar Sep 30 '24 20:09 SourceR85

No worries at all, thanks for reaching out.

Yeah, agree a more friendly error would be nice in the logs.

And it turns out we do have a disk monitor now in 3.4 (the work of @rnewson)!

https://docs.couchdb.org/en/stable/config/disk-monitor.html if you configure it, it will stop indexing when approaching the limit and return a meaningful API error.

See https://github.com/apache/couchdb/pull/4681 for the PR comments and the implementation.

nickva avatar Sep 30 '24 20:09 nickva