couchdb Nouveau Availability

Hello.

We are planing to deploy CouchDB to all our customers sites. Because of that we are evaluating how we will rollout couchdb. It would be good to know when we can expect version 3.4.0. Is there any possible timeframe known a this point in time? And will nouveau be included (as stable) in 3.4.0?

Thanks for your help.

Mar 12 '24 19:03 jonasplaum

Nouveau will be included in 3.4.0 release.

There is not set timeframe, but currently the main thing we're waiting on is review of the nouveau deb packaging PR in https://github.com/apache/couchdb-pkg/pull/125. I am planning on reviewing this week or weekend.

Mar 13 '24 02:03 nickva

Thanks. This sounds good to me. I have a second question about Nouveau. We want to have one or more couchdb and corresponding nouveau nodes for our customer in different offices from our customers. We need to sync the data between the nodes, and everything should still work for every node, even if the Network between the nodes is lost. The should get synced after network restoration. We use clustering with n = NODE_COUNT and q = 1, so that we get all data on every node. This seems to work as expected, but Nouveau doesen't work if any node is down. We get the following message if a node (In this example [email protected]) is down and is not included in the error response below:

"error": "badrecord",
    "reason": "[{{shard,<<\"shards/00000000-ffffffff/foo.1710344337\">>,\n         '[email protected]',<<\"foo\">>,\n         [0,4294967295],\n         #Ref<0.1884847839.2480668675.227080>,\n         [{props,[]}]},\n  nil},\n {{shard,<<\"shards/00000000-ffffffff/foo.1710344337\">>,\n         '[email protected]',<<\"foo\">>,\n         [0,4294967295],\n         #Ref<0.1884847839.2480668675.227078>,\n         [{props,[]}]},\n  nil}]",
    "ref": 3715306381

This is the Log part from the CouchDB Server 3 that got the request:

[error] 2024-03-14T08:26:05.582072Z [email protected] <0.902.0> 844d099269 req_err(3715306381) badrecord : [{{shard,<<"shards/00000000-ffffffff/foo.1710344337">>,
         '[email protected]',<<"foo">>,
         [0,4294967295],
         #Ref<0.1884847839.2480668675.227080>,
         [{props,[]}]},
  nil},
 {{shard,<<"shards/00000000-ffffffff/foo.1710344337">>,
         '[email protected]',<<"foo">>,
         [0,4294967295],
         #Ref<0.1884847839.2480668675.227078>,
         [{props,[]}]},
  nil}]
    [<<"nouveau_fabric_search:handle_message/3 L84">>,<<"rexi_utils:process_mailbox/6 L55">>,<<"nouveau_fabric_search:go/4 L64">>,<<"nouveau_httpd:handle_search_req/6 L103">>,<<"nouveau_httpd:handle_search_req/3 L56">>,<<"chttpd:handle_req_after_auth/2 L416">>,<<"chttpd:process_request/1 L394">>,<<"chttpd:handle_request_int/1 L329">>]
[notice] 2024-03-14T08:26:05.582265Z [email protected] <0.902.0> 844d099269 10.249.4.203:5984 192.168.244.83 admin GET /foo/_design/foo/_nouveau/search?q=_id:doc1706* 500 ok 38

Is it not supported by Nouveau to work only with the local node in this case, or do i need to change something for this to work?

Thanks.

Mar 13 '24 20:03 jonasplaum

Nouveau will be marked EXPERIMENTAL in couchdb 3.4.0 as we gather feedback from the community.

I certainly expect fault-tolerance from Nouveau so will look into your finding this week.

Mar 25 '24 08:03 rnewson

as an aside, we strongly recommend that couchdb clusters do not span locations (offices in your case). the nodes of any given cluster should be very close together (<1ms ping time). For your use case we'd recommend a cluster per office and use of the http replication facility to sync data between offices.

Mar 25 '24 08:03 rnewson

further aside, you don't have to have a nouveau node for each couchdb node, you can safely point multiple couchdb nodes at the same nouveau node. whether this is better or worse for you will depend on what you're doing and the performance specs of the server(s) nouveau is running on. one-to-one is a sensible place to start, though.

Mar 25 '24 08:03 rnewson

I figured it out and have posted a PR to fix the fault tolerance.

Mar 25 '24 09:03 rnewson

Thanks for the fast fix. I will test this later.

as an aside, we strongly recommend that couchdb clusters do not span locations

I thought that with the following configuration there would be no problem to use a cluster instead of replication:

[cluster]
q=1
n=3
w=1
r=1

But if thats not supported, we need to switch to replication. In this case we need to replicate database deletes in the application layer, right?

you don't have to have a nouveau node for each couchdb node

We do this, because we need every location working independent if there is a network failure between them.

I still have some questions about how i should setup couchdb for our usecase. Where should i ask this questions? This issue is not the right place. Should i use an other issue or the mailing list?

Mar 25 '24 12:03 jonasplaum

our slack is the best place for this kind of chat (couchdb.slack.com).

a couple of notes;

the 'r' and 'w' fields under cluster do nothing, they are no longer used by the code (for several years now)
erlang clusters (and therefore also couchdb clusters) need low latency between them, and reliable networking too. if these conditions aren't met we recommend separate clusters that use replication to push data around, as our replication system is tolerant of high latency and unreliable networking.

Mar 25 '24 13:03 rnewson

oh, and 3) yes, you would need to delete databases within each cluster separately, as db deletion is not propagated by replication.

Mar 25 '24 13:03 rnewson