couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

Running out of disk space for CouchDB

Open guanfeix opened this issue 2 years ago • 8 comments

OS: Ubuntu 14.04.1 KernelVersion: 4.19.90 Architecture:x86_64 CouchDB version: 1.6.1 Erlang version: R16B03 It's a single node,a little traffic,maybe about sixty dbs,each with around ten view_functions average Our Couchdb database directory is /var/lib/couchdb After couchdb restart , df -h to check free 30G,even du -sh show no change in disk take up. Find the question(root file system space is not enough): 图片 After restart couchdb: 图片

And also I find the largest db file have a compact file dpl_job.compact,which compaction can't be really restart.so this db will never be compact.so It will increase always. I try every methods to compact this database,autocompaction, url, futon compact and cleanup operation. whether delete(wiil produce a new smaller one) or not. Below is I test in my own machine. this db takes no such big in disk,but the problem is the same 图片

this maybe related 图片

guanfeix avatar Mar 11 '22 10:03 guanfeix

This is some part of log,because of log level limits maybe not so detailed couch-5985.log

guanfeix avatar Mar 11 '22 12:03 guanfeix

It looks like the compactor is failing to compact the database. It's hard to say immediately why, but behaves as if the documents appeared in the sequence index but then miss from the main doc id index.

Perhaps try to stop any updates to the database, stop replications, and try to compact again manually. Otherwise investigate if the perhaps there is a failing disk.

As this is 1.6, try to upgrade to latest 1.7. Then, try to replicate that db to a new copy. But, seeing as we have these missing document bodies that appear in the changes feed that might fail as well.

nickva avatar Mar 11 '22 15:03 nickva

I try stop the replicate, delete the dpl_job.compact,restart couchdb. catch the log [Mon, 14 Mar 2022 02:55:56 GMT] [info] [<0.1327.0>] Starting compaction for db "dpl_job" [Mon, 14 Mar 2022 02:55:56 GMT] [info] [<0.117.0>] 127.0.0.1 - - POST /dpl_job/_compact 202 [Mon, 14 Mar 2022 02:55:56 GMT] [error] [emulator] Error in process <0.1332.0> with exit value: {function_clause,[{couch_db_updater,'-copy_docs/4-fun-3-',[not_found],[{file,"couch_db_updater.erl"},{line,862}]},{lists,map,2,[{file,"lists.erl"},{line,1224}]},{lists,map,2,[{file,"lists.erl"},{line,1224}]},{couch_db_updater,copy_docs...

[Mon, 14 Mar 2022 02:55:56 GMT] [error] [<0.1327.0>] ** Generic server <0.1327.0> terminating ** Last message in was {'EXIT',<0.1332.0>, {function_clause, [{couch_db_updater,'-copy_docs/4-fun-3-', [not_found], [{file,"couch_db_updater.erl"}, {line,862}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {couch_db_updater,copy_docs,4, [{file,"couch_db_updater.erl"}, {line,861}]}, {couch_db_updater,copy_compact,3, [{file,"couch_db_updater.erl"}, {line,961}]}, {couch_db_updater,start_copy_compact,1, [{file,"couch_db_updater.erl"}, {line,1004}]}]}} ** When Server state == {db,<0.1326.0>,<0.1327.0>,<0.1332.0>, <<"1647226556413367">>,<0.1328.0>,<0.1324.0>, <0.1330.0>, {db_header,6,36085,0, {433507267,{55,456,65074},88060}, {433509119,543,43262}, {433522899,[],20453}, 140,417682521,nil,10}, 36085, {btree,<0.1324.0>, {433507267,{55,456,65074},88060}, #Fun<couch_db_updater.10.58444962>, #Fun<couch_db_updater.11.58444962>, #Fun<couch_btree.5.15886126>, #Fun<couch_db_updater.12.58444962>,snappy}, {btree,<0.1324.0>, {433509119,543,43262}, #Fun<couch_db_updater.13.58444962>, #Fun<couch_db_updater.14.58444962>, #Fun<couch_btree.5.15886126>, #Fun<couch_db_updater.15.58444962>,snappy}, {btree,<0.1324.0>, {433522899,[],20453}, #Fun<couch_btree.3.15886126>, #Fun<couch_btree.4.15886126>, #Fun<couch_btree.5.15886126>,nil,snappy}, 36085,<<"dpl_job">>, "/var/lib/couchdb-5985/dpl_job.couch",[],[],nil, {user_ctx,null,[],undefined}, nil,10, [before_header,after_header,on_file_open], [{user_ctx, {user_ctx,null, [<<"_admin">>], <<"{couch_httpd_auth, default_authentication_handler}">>}}], snappy,nil,nil} ** Reason for termination == ** {function_clause, [{couch_db_updater,'-copy_docs/4-fun-3-', [not_found], [{file,"couch_db_updater.erl"},{line,862}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {couch_db_updater,copy_docs,4, [{file,"couch_db_updater.erl"},{line,861}]}, {couch_db_updater,copy_compact,3, [{file,"couch_db_updater.erl"},{line,961}]}, {couch_db_updater,start_copy_compact,1, [{file,"couch_db_updater.erl"},{line,1004}]}]}

[Mon, 14 Mar 2022 02:55:56 GMT] [error] [<0.1327.0>] {error_report,<0.31.0>, {<0.1327.0>,crash_report, [[{initial_call, {couch_db_updater,init,['Argument__1']}}, {pid,<0.1327.0>}, {registered_name,[]}, {error_info, {exit, {function_clause, [{couch_db_updater,'-copy_docs/4-fun-3-', [not_found], [{file,"couch_db_updater.erl"},{line,862}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {couch_db_updater,copy_docs,4, [{file,"couch_db_updater.erl"},{line,861}]}, {couch_db_updater,copy_compact,3, [{file,"couch_db_updater.erl"},{line,961}]}, {couch_db_updater,start_copy_compact,1, [{file,"couch_db_updater.erl"},{line,1004}]}]}, [{gen_server,terminate,6, [{file,"gen_server.erl"},{line,744}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]}}, {ancestors,[<0.1326.0>,<0.1323.0>]}, {messages,[]}, {links,[<0.1326.0>]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,4185}, {stack_size,27}, {reductions,1744}], []]}} [Mon, 14 Mar 2022 02:55:56 GMT] [error] [<0.1326.0>] ** Generic server <0.1326.0> terminating ** Last message in was {'EXIT',<0.1327.0>, {function_clause, [{couch_db_updater,'-copy_docs/4-fun-3-', [not_found], [{file,"couch_db_updater.erl"}, {line,862}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {couch_db_updater,copy_docs,4, [{file,"couch_db_updater.erl"}, {line,861}]}, {couch_db_updater,copy_compact,3, [{file,"couch_db_updater.erl"}, {line,961}]}, {couch_db_updater,start_copy_compact,1, [{file,"couch_db_updater.erl"}, {line,1004}]}]}} ** When Server state == {db,<0.1326.0>,<0.1327.0>,<0.1332.0>, <<"1647226556413367">>,<0.1328.0>,<0.1324.0>, <0.1330.0>, {db_header,6,36085,0, {433507267,{55,456,65074},88060}, {433509119,543,43262}, {433522899,[],20453}, 140,417682521,nil,10}, 36085, {btree,<0.1324.0>, {433507267,{55,456,65074},88060}, #Fun<couch_db_updater.10.58444962>, #Fun<couch_db_updater.11.58444962>, #Fun<couch_btree.5.15886126>, #Fun<couch_db_updater.12.58444962>,snappy}, {btree,<0.1324.0>, {433509119,543,43262}, #Fun<couch_db_updater.13.58444962>, #Fun<couch_db_updater.14.58444962>, #Fun<couch_btree.5.15886126>, #Fun<couch_db_updater.15.58444962>,snappy}, {btree,<0.1324.0>, {433522899,[],20453}, #Fun<couch_btree.3.15886126>, #Fun<couch_btree.4.15886126>, #Fun<couch_btree.5.15886126>,nil,snappy}, 36085,<<"dpl_job">>, "/var/lib/couchdb-5985/dpl_job.couch",[],[],nil, {user_ctx,null,[],undefined}, nil,10, [before_header,after_header,on_file_open], [{user_ctx, {user_ctx,null, [<<"_admin">>], <<"{couch_httpd_auth, default_authentication_handler}">>}}], snappy,nil,nil} ** Reason for termination == ** {function_clause, [{couch_db_updater,'-copy_docs/4-fun-3-', [not_found], [{file,"couch_db_updater.erl"},{line,862}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {couch_db_updater,copy_docs,4, [{file,"couch_db_updater.erl"},{line,861}]}, {couch_db_updater,copy_compact,3, [{file,"couch_db_updater.erl"},{line,961}]}, {couch_db_updater,start_copy_compact,1, [{file,"couch_db_updater.erl"},{line,1004}]}]}

[Mon, 14 Mar 2022 02:55:56 GMT] [error] [<0.1326.0>] {error_report,<0.31.0>, {<0.1326.0>,crash_report, [[{initial_call,{couch_db,init,['Argument__1']}}, {pid,<0.1326.0>}, {registered_name,[]}, {error_info, {exit, {function_clause, [{couch_db_updater,'-copy_docs/4-fun-3-', [not_found], [{file,"couch_db_updater.erl"},{line,862}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {lists,map,2,[{file,"lists.erl"},{line,1224}]}, {couch_db_updater,copy_docs,4, [{file,"couch_db_updater.erl"},{line,861}]}, {couch_db_updater,copy_compact,3, [{file,"couch_db_updater.erl"},{line,961}]}, {couch_db_updater,start_copy_compact,1, [{file,"couch_db_updater.erl"},{line,1004}]}]}, [{gen_server,terminate,6, [{file,"gen_server.erl"},{line,744}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,239}]}]}}, {ancestors,[<0.1323.0>]}, {messages,[]}, {links,[<0.88.0>]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,987}, {stack_size,27}, {reductions,308}], []]}} [Mon, 14 Mar 2022 02:55:56 GMT] [error] [<0.88.0>] Unexpected exit of database process <0.1326.0> [<<"dpl_job">>]: {function_clause, [{couch_db_updater, '-copy_docs/4-fun-3-', [not_found], [{file, "couch_db_updater.erl"}, {line, 862}]}, {lists, map, 2, [{file, "lists.erl"}, {line, 1224}]}, {lists, map, 2, [{file, "lists.erl"}, {line, 1224}]}, {couch_db_updater, copy_docs, 4, [{file, "couch_db_updater.erl"}, {line, 861}]}, {couch_db_updater, copy_compact, 3, [{file, "couch_db_updater.erl"}, {line, 961}]}, {couch_db_updater, start_copy_compact, 1, [{file, "couch_db_updater.erl"}, {line, 1004}]}]}

guanfeix avatar Mar 14 '22 03:03 guanfeix

It looks like the compactor is failing to compact the database. It's hard to say immediately why, but behaves as if the documents appeared in the sequence index but then miss from the main doc id index.

Perhaps try to stop any updates to the database, stop replications, and try to compact again manually. Otherwise investigate if the perhaps there is a failing disk.

As this is 1.6, try to upgrade to latest 1.7. Then, try to replicate that db to a new copy. But, seeing as we have these missing document bodies that appear in the changes feed that might fail as well.

Your means latest 1.7 refer to which version.where can i get it. I try the 1.7.1 copy the failed database from 1.6.1,still failed to compact.The view compaction is ok.Where can I find the old version of couchdb. such as deb package for ubuntu 14.0.4.

guanfeix avatar Mar 17 '22 05:03 guanfeix

Thanks for trying on 1.7. One last possible thing to try could be to replicate the database to a new database. If that fails as well, and there is a good chance it might, you can try doing an _all_docs query with include_docs=true and copy those documents manually to a new database (with a script).

Since Ubuntu https://help.ubuntu.com/community/EOL#Ubuntu_14.04_Trusty_Tahr end of life was 3 years ago we don't support it any longer.

nickva avatar Mar 17 '22 06:03 nickva

Thanks for trying on 1.7. One last possible thing to try could be to replicate the database to a new database. If that fails as well, and there is a good chance it might, you can try doing an _all_docs query with include_docs=true and copy those documents manually to a new database (with a script).

Since Ubuntu https://help.ubuntu.com/community/EOL#Ubuntu_14.04_Trusty_Tahr end of life was 3 years ago we don't support it any longer.

Replicate the couchdb is ok, this problem can be solved by that.But we still want to how to aviod this problem because it happened in client. I hope some peole can help me of question belowed for this problem.

  1. If we didn't to process this database.If this will increase continuously to take up the rest available space of its filesystem.
  2. Why does this problem happen,only 40 documents to take so much disk space,is there any problem of our way to use CouchDB
  3. In the last I want to know if 1.7.1 contains commit of this issue https://github.com/apache/couchdb/issues/1001. May be the same question for this problem
  4. If 1.7.1 didn't contains such commit, I will try to install from source code, which will be my best choice of 1.7 latest.No matter it from github or official website.

图片

guanfeix avatar Mar 18 '22 05:03 guanfeix

Do you have any document attachments in that database, or any views that emit a lot of rows and/or data, or a lot of deleted/removed documents? I think those factors could explain the disk usage.

Also, what is the disk usage of the replicated database? Is it similar to the original one?

rudasn avatar Apr 19 '22 07:04 rudasn

Do you have any document attachments in that database, or any views that emit a lot of rows and/or data, or a lot of deleted/removed documents? I think those factors could explain the disk usage.

Also, what is the disk usage of the replicated database? Is it similar to the original one?

Yes, a lot of views in some of most common use database,it maybe cache a lot data. And I learn about difference between df and du cmd, is phantom file.So when use lsof|grep deleted, I found a lot of such deleted file in ubuntu.But how can I to avoid such use by that. As for the replicated database,it have little difference. 图片 图片

图片

guanfeix avatar Apr 28 '22 07:04 guanfeix