rabbitmq-server icon indicating copy to clipboard operation
rabbitmq-server copied to clipboard

Feature flags detection sometimes triggers `erpc,noconnection`

Open lukebakken opened this issue 2 years ago • 3 comments

Describe the bug

  • Start a RabbitMQ cluster
  • Restart a node
Logs

2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags: on node `rabbit@rabbit2`:
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags:   exception error: {erpc,noconnection}
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags:     in function  erpc:call/5 (erpc.erl, line 710)
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags:     in call from rabbit_ff_controller:rpc_call/5 (rabbit_ff_controller.erl, line 1123)
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags:     in call from lists:foreach_1/2 (lists.erl, line 1442)
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags:     in call from rabbit_feature_flags:check_node_compatibility_v1/2 (rabbit_feature_flags.erl, line 1599)
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags:     in call from rabbit_mnesia:check_rabbit_consistency/2 (rabbit_mnesia.erl, line 1017)
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags:     in call from rabbit_mnesia:check_consistency/5 (rabbit_mnesia.erl, line 948)
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags:     in call from rabbit_mnesia:check_cluster_consistency/2 (rabbit_mnesia.erl, line 746)
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> Feature flags:     in call from lists:foldl/3 (lists.erl, line 1350)
2023-05-24 01:39:55.227067-07:00 [error] <0.231.0> 
2023-05-24 01:39:55.243345-07:00 [error] <0.277.0> Mnesia(rabbit@rabbit3): ** ERROR ** Mnesia on rabbit@rabbit3 could not connect to node(s) [rabbit@rabbit2]

Reproduction steps

See above.

Expected behavior

No erpc error - either it is re-tried, or it is not tried until disterl is definitely up and running.

Additional context

Observed in the following situations:

  • https://pivotal-esc.atlassian.net/browse/VESC-1073
  • https://github.com/rabbitmq/rabbitmq-server/issues/8114
  • https://vmware.slack.com/archives/C0RDGG81Z/p1684967685447889

lukebakken avatar May 25 '23 15:05 lukebakken

I think the expected behavior should be "the operation is retried N times" :)

michaelklishin avatar May 25 '23 15:05 michaelklishin

We stumbled over this by user error in #10100 and as requested, here is the step by step to get the same error message. Although, bear in mind that this happened to me only because I forgot the "rabbit@" when trying to call join_cluster:

$ docker network create test_network
1947438e01b9cced503ba3044be1afb1f5a6225fb64d265257b3547b947cad64
$ docker run -d --network test_network --name rabbit1 --privileged -v $(pwd)/cookie:/var/lib/rabbitmq/.erlang.cookie pivotalrabbitmq/rabbitmq:main-otp-max-bazel
b29a66ec3350cb7ee60975d3a1b8c0bd7918313f30833be76a113d0ea0c78590
$ docker container ls
CONTAINER ID        IMAGE                                         COMMAND                  CREATED             STATUS              PORTS                                                                                                                      NAMES
b29a66ec3350        pivotalrabbitmq/rabbitmq:main-otp-max-bazel   "docker-entrypoint.s…"   38 seconds ago      Up 36 seconds       1883/tcp, 4369/tcp, 5551-5552/tcp, 5671-5672/tcp, 8883/tcp, 15670-15676/tcp, 15691-15692/tcp, 25672/tcp, 61613-61614/tcp   rabbit1
$ docker exec -it b2 /bin/bash
root@b29a66ec3350:/# rabbitmqctl join_cluster this_node_does_not_exist
Clustering node rabbit@b29a66ec3350 with this_node_does_not_exist

13:03:53.487 [error] Feature flags: error while running:
Feature flags:   rabbit_ff_controller:running_nodes[]
Feature flags: on node `this_node_does_not_exist@b29a66ec3350`:
Feature flags:   exception error: {erpc,noconnection}
Feature flags:     in function  erpc:call/5 (erpc.erl, line 710)
Feature flags:     in call from rabbit_ff_controller:rpc_call/5 (rabbit_ff_controller.erl, line 1377)
Feature flags:     in call from rabbit_ff_controller:list_nodes_clustered_with/1 (rabbit_ff_controller.erl, line 477)
Feature flags:     in call from rabbit_ff_controller:check_node_compatibility_task/2 (rabbit_ff_controller.erl, line 389)
Feature flags:     in call from rabbit_db_cluster:can_join/1 (rabbit_db_cluster.erl, line 65)
Feature flags:     in call from rabbit_db_cluster:join/2 (rabbit_db_cluster.erl, line 97)
Feature flags:     in call from erpc:execute_call/4 (erpc.erl, line 589)

Error:
{:aborted_feature_flags_compat_check, {:error, {:erpc, :noconnection}}}
root@b29a66ec3350:/# 

kepakiano avatar Dec 14 '23 13:12 kepakiano

It's not clear to me from this log what exactly logs this message: the node or the shell where rabbitmqctl join_cluster this_node_does_not_exist is executed?

In any case, join_cluster should bail early if it cannot contact its not-to-be-joint.

michaelklishin avatar Dec 14 '23 23:12 michaelklishin