rabbitmq-server icon indicating copy to clipboard operation
rabbitmq-server copied to clipboard

`khepri_db`: `function_clause` in `rabbit_federation_exchange_link_sup_sup` on network disconnect

Open lukebakken opened this issue 1 year ago • 3 comments

Describe the bug

Disconnecting the network to one node of a 3-node khepri-enabled cluster eventually results in a strange function_clause error:

rmq0-function_clause-stack.txt

The error also originates from the rabbit_federation_queue_link_sup_sup process as well. My test project enables the rabbitmq_federation plugin, but does not create any federation links.

Reproduction steps

  • Start cluster
    git clone [email protected]:lukebakken/docker-rabbitmq-cluster.git
    cd docker-rabbitmq-cluster
    git checkout khepri
    make DOCKER_FRESH=true clean up
    
  • Disconnect node rmq0
    docker network disconnect rabbitnet docker-rabbitmq-cluster-rmq0-1
    
  • Watch logs until function_clause error happens

Expected behavior

No error.

Additional context

This does not appear to affect the normal operation of PerfTest.

In addition, the following log lines appear:

rmq2-1       | 2024-09-11 00:29:10.084227+00:00 [error] <0.181.0>
rmq2-1       | 2024-09-11 00:29:10.084227+00:00 [error] <0.181.0> ** Cannot get connection id for node '[email protected]'
rmq2-1       | 2024-09-11 00:29:10.084227+00:00 [error] <0.181.0>
rmq1-1       | 2024-09-11 00:29:10.096091+00:00 [error] <0.181.0>
rmq1-1       | 2024-09-11 00:29:10.096091+00:00 [error] <0.181.0> ** Cannot get connection id for node '[email protected]'
rmq1-1       | 2024-09-11 00:29:10.096091+00:00 [error] <0.181.0>

These log lines originate in OTP itself:

lbakken@shostakovich ~/development/erlang/otp (master =)
$ git grep -i 'cannot get connection'
lib/kernel/src/net_kernel.erl:1051:            error_logger:error_msg("~n** Cannot get connection id for node ~w~n",
lib/kernel/src/net_kernel.erl:1156:                error_logger:error_msg("~n** Cannot get connection id for node ~w~n",
lib/kernel/src/net_kernel.erl:1545:                    error_logger:error_msg("~n** Cannot get connection id for node ~w~n",

What's odd is that the error messages originate from the node to which the error message refers 🤔

lukebakken avatar Sep 11 '24 00:09 lukebakken