arangodb icon indicating copy to clipboard operation
arangodb copied to clipboard

Starter Recovery - misleading log message

Open maxkernbach opened this issue 7 years ago • 1 comments
trafficstars

The stater recovery procedure is working, i just noticed one misleading starter log message.

Setup: local starter cluster with arangodb endpoints 8528(db1), 8538(db2), 8548(db3)

  • kill -9 arangodb and all of its started arangod processes to simulate a node crash
  • create recovery file in new directory (echo "127.0.0.1:8548" > db4/RECOVERY)

recovery log:

/usr/bin/arangodb --starter.data-dir=/home/max/Documents/starter/db4 --starter.join 127.0.0.1
2018-07-18T10:43:39+02:00 |INFO| Starting arangodb version 0.12.0, build cd81a60 component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Recovery information all available, starting... component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Contacting master http://127.0.0.1:8528... component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Waiting for 3 servers to show up... component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Serving as slave with ID 'ab9a0fdb' on :8528... component=arangodb
2018-07-18T10:43:39+02:00 |INFO| ArangoDB Starter listening on 0.0.0.0:8548 (:8548) component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Looking for a running instance of agent on port 8571 component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Starting agent on port 8571 component=arangodb
2018-07-18T10:43:40+02:00 |INFO| Looking for a running instance of dbserver on port 8570 component=arangodb
2018-07-18T10:43:40+02:00 |INFO| Starting dbserver on port 8570 component=arangodb
2018-07-18T10:43:40+02:00 |INFO| agent up and running (version 3.2.16). component=arangodb
2018-07-18T10:43:41+02:00 |INFO| Looking for a running instance of coordinator on port 8569 component=arangodb
2018-07-18T10:43:41+02:00 |INFO| Starting coordinator on port 8569 component=arangodb
2018-07-18T10:43:41+02:00 |INFO| dbserver up and running (version 3.2.16). component=arangodb
2018-07-18T10:43:42+02:00 |INFO| coordinator up and running (version 3.2.16). component=arangodb
2018-07-18T10:43:42+02:00 |INFO| Your cluster can now be accessed with a browser at `http://localhost:8569` or component=arangodb
2018-07-18T10:43:42+02:00 |INFO| using `arangosh --server.endpoint tcp://localhost:8569`. component=arangodb
2018-07-18T10:43:42+02:00 |INFO| Removed RECOVERY file. component=arangodb
2018-07-18T10:43:42+02:00 |INFO| Most likely there is now an extra coordinator & dbserver in FAILED state. Remove them manually using the web UI. component=arangodb
  • the correct arangodb port is shown when the recovery is started 2018-07-18T10:43:39+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb

  • the wrong arangodb port is shown when the new starter is initialized (8548 shown, used port is 8568) 2018-07-18T10:43:39+02:00 |INFO| ArangoDB Starter listening on 0.0.0.0:8548 (:8548) component=arangodb

  • the use of port 8568 is expected as it has the default offset of coordinator (+1) dbserver (+2) agent (+3)

  • when shutting down the recovered node and doing another recovery process (new directory db5) with port 8548 in the recovery file, the used port 8568 is shown in the error message

/usr/bin/arangodb --starter.data-dir=/home/max/Documents/starter/db5 --starter.join 127.0.0.1
2018-07-18T11:04:56+02:00 |INFO| Starting arangodb version 0.12.0, build cd81a60 component=arangodb
2018-07-18T11:04:56+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb
2018-07-18T11:04:56+02:00 |ERRO| Cannot find a peer in cluster configuration for address localhost with port 8548 component=arangodb
2018-07-18T11:04:56+02:00 |INFO| Starters found are: localhost:8528, localhost:8538, localhost:8568 component=arangodb
2018-07-18T11:04:56+02:00 |FATA| Failed to recover component=arangodb error="No peer found for localhost:8548"

maxkernbach avatar Jul 18 '18 15:07 maxkernbach

I fear this is related to the use of localhost for all starters. It is intended to be used on separate machines

Op wo 18 jul. 2018 17:42 schreef maxkernbach [email protected]:

The stater recovery procedure https://docs.arangodb.com/devel/Manual/Administration/Starter/ is working, i just noticed one misleading starter log message.

Setup: local starter cluster with arangodb endpoints 8528(db1), 8538(db2), 8548(db3)

  • kill -9 arangodb and all of its started arangod processes to simulate a node crash
  • create recovery file in new directory (echo "127.0.0.1:8548" > db4/RECOVERY)

recovery log:

/usr/bin/arangodb --starter.data-dir=/home/max/Documents/starter/db4 --starter.join 127.0.0.1 2018-07-18T10:43:39+02:00 |INFO| Starting arangodb version 0.12.0, build cd81a60 component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Recovery information all available, starting... component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Contacting master http://127.0.0.1:8528... component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Waiting for 3 servers to show up... component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Serving as slave with ID 'ab9a0fdb' on :8528... component=arangodb 2018-07-18T10:43:39+02:00 |INFO| ArangoDB Starter listening on 0.0.0.0:8548 (:8548) component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Looking for a running instance of agent on port 8571 component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Starting agent on port 8571 component=arangodb 2018-07-18T10:43:40+02:00 |INFO| Looking for a running instance of dbserver on port 8570 component=arangodb 2018-07-18T10:43:40+02:00 |INFO| Starting dbserver on port 8570 component=arangodb 2018-07-18T10:43:40+02:00 |INFO| agent up and running (version 3.2.16). component=arangodb 2018-07-18T10:43:41+02:00 |INFO| Looking for a running instance of coordinator on port 8569 component=arangodb 2018-07-18T10:43:41+02:00 |INFO| Starting coordinator on port 8569 component=arangodb 2018-07-18T10:43:41+02:00 |INFO| dbserver up and running (version 3.2.16). component=arangodb 2018-07-18T10:43:42+02:00 |INFO| coordinator up and running (version 3.2.16). component=arangodb 2018-07-18T10:43:42+02:00 |INFO| Your cluster can now be accessed with a browser at http://localhost:8569 or component=arangodb 2018-07-18T10:43:42+02:00 |INFO| using arangosh --server.endpoint tcp://localhost:8569. component=arangodb 2018-07-18T10:43:42+02:00 |INFO| Removed RECOVERY file. component=arangodb 2018-07-18T10:43:42+02:00 |INFO| Most likely there is now an extra coordinator & dbserver in FAILED state. Remove them manually using the web UI. component=arangodb

the correct arangodb port is shown when the recovery is started 2018-07-18T10:43:39+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb

the wrong arangodb port is shown when the new starter is initialized (8548 shown, used port is 8568) 2018-07-18T10:43:39+02:00 |INFO| ArangoDB Starter listening on 0.0.0.0:8548 (:8548) component=arangodb

the use of port 8568 is expected as it has the default offset of coordinator (+1) dbserver (+2) agent (+3)

when shutting down the recovered node and doing another recovery process (new directory db5) with port 8548 in the recovery file, the used port 8568 is shown in the error message

/usr/bin/arangodb --starter.data-dir=/home/max/Documents/starter/db5 --starter.join 127.0.0.1 2018-07-18T11:04:56+02:00 |INFO| Starting arangodb version 0.12.0, build cd81a60 component=arangodb 2018-07-18T11:04:56+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb 2018-07-18T11:04:56+02:00 |ERRO| Cannot find a peer in cluster configuration for address localhost with port 8548 component=arangodb 2018-07-18T11:04:56+02:00 |INFO| Starters found are: localhost:8528, localhost:8538, localhost:8568 component=arangodb 2018-07-18T11:04:56+02:00 |FATA| Failed to recover component=arangodb error="No peer found for localhost:8548"

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/arangodb-helper/arangodb/issues/184, or mute the thread https://github.com/notifications/unsubscribe-auth/AEh-LIXJZdkLSnaM-JNN-Aq2N2u7-e_Yks5uH1dzgaJpZM4VU2HM .

ewoutp avatar Jul 18 '18 16:07 ewoutp