arangodb
arangodb copied to clipboard
Starter Recovery - misleading log message
The stater recovery procedure is working, i just noticed one misleading starter log message.
Setup: local starter cluster with arangodb endpoints 8528(db1), 8538(db2), 8548(db3)
- kill -9
arangodband all of its startedarangodprocesses to simulate a node crash - create recovery file in new directory (echo "127.0.0.1:8548" > db4/RECOVERY)
recovery log:
/usr/bin/arangodb --starter.data-dir=/home/max/Documents/starter/db4 --starter.join 127.0.0.1
2018-07-18T10:43:39+02:00 |INFO| Starting arangodb version 0.12.0, build cd81a60 component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Recovery information all available, starting... component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Contacting master http://127.0.0.1:8528... component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Waiting for 3 servers to show up... component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Serving as slave with ID 'ab9a0fdb' on :8528... component=arangodb
2018-07-18T10:43:39+02:00 |INFO| ArangoDB Starter listening on 0.0.0.0:8548 (:8548) component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Looking for a running instance of agent on port 8571 component=arangodb
2018-07-18T10:43:39+02:00 |INFO| Starting agent on port 8571 component=arangodb
2018-07-18T10:43:40+02:00 |INFO| Looking for a running instance of dbserver on port 8570 component=arangodb
2018-07-18T10:43:40+02:00 |INFO| Starting dbserver on port 8570 component=arangodb
2018-07-18T10:43:40+02:00 |INFO| agent up and running (version 3.2.16). component=arangodb
2018-07-18T10:43:41+02:00 |INFO| Looking for a running instance of coordinator on port 8569 component=arangodb
2018-07-18T10:43:41+02:00 |INFO| Starting coordinator on port 8569 component=arangodb
2018-07-18T10:43:41+02:00 |INFO| dbserver up and running (version 3.2.16). component=arangodb
2018-07-18T10:43:42+02:00 |INFO| coordinator up and running (version 3.2.16). component=arangodb
2018-07-18T10:43:42+02:00 |INFO| Your cluster can now be accessed with a browser at `http://localhost:8569` or component=arangodb
2018-07-18T10:43:42+02:00 |INFO| using `arangosh --server.endpoint tcp://localhost:8569`. component=arangodb
2018-07-18T10:43:42+02:00 |INFO| Removed RECOVERY file. component=arangodb
2018-07-18T10:43:42+02:00 |INFO| Most likely there is now an extra coordinator & dbserver in FAILED state. Remove them manually using the web UI. component=arangodb
-
the correct
arangodbport is shown when the recovery is started2018-07-18T10:43:39+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb -
the wrong
arangodbport is shown when the new starter is initialized (8548 shown, used port is 8568)2018-07-18T10:43:39+02:00 |INFO| ArangoDB Starter listening on 0.0.0.0:8548 (:8548) component=arangodb -
the use of port 8568 is expected as it has the default offset of coordinator (+1) dbserver (+2) agent (+3)
-
when shutting down the recovered node and doing another recovery process (new directory db5) with port 8548 in the recovery file, the used port 8568 is shown in the error message
/usr/bin/arangodb --starter.data-dir=/home/max/Documents/starter/db5 --starter.join 127.0.0.1
2018-07-18T11:04:56+02:00 |INFO| Starting arangodb version 0.12.0, build cd81a60 component=arangodb
2018-07-18T11:04:56+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb
2018-07-18T11:04:56+02:00 |ERRO| Cannot find a peer in cluster configuration for address localhost with port 8548 component=arangodb
2018-07-18T11:04:56+02:00 |INFO| Starters found are: localhost:8528, localhost:8538, localhost:8568 component=arangodb
2018-07-18T11:04:56+02:00 |FATA| Failed to recover component=arangodb error="No peer found for localhost:8548"
I fear this is related to the use of localhost for all starters. It is intended to be used on separate machines
Op wo 18 jul. 2018 17:42 schreef maxkernbach [email protected]:
The stater recovery procedure https://docs.arangodb.com/devel/Manual/Administration/Starter/ is working, i just noticed one misleading starter log message.
Setup: local starter cluster with arangodb endpoints 8528(db1), 8538(db2), 8548(db3)
- kill -9 arangodb and all of its started arangod processes to simulate a node crash
- create recovery file in new directory (echo "127.0.0.1:8548" > db4/RECOVERY)
recovery log:
/usr/bin/arangodb --starter.data-dir=/home/max/Documents/starter/db4 --starter.join 127.0.0.1 2018-07-18T10:43:39+02:00 |INFO| Starting arangodb version 0.12.0, build cd81a60 component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Recovery information all available, starting... component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Contacting master http://127.0.0.1:8528... component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Waiting for 3 servers to show up... component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Serving as slave with ID 'ab9a0fdb' on :8528... component=arangodb 2018-07-18T10:43:39+02:00 |INFO| ArangoDB Starter listening on 0.0.0.0:8548 (:8548) component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Looking for a running instance of agent on port 8571 component=arangodb 2018-07-18T10:43:39+02:00 |INFO| Starting agent on port 8571 component=arangodb 2018-07-18T10:43:40+02:00 |INFO| Looking for a running instance of dbserver on port 8570 component=arangodb 2018-07-18T10:43:40+02:00 |INFO| Starting dbserver on port 8570 component=arangodb 2018-07-18T10:43:40+02:00 |INFO| agent up and running (version 3.2.16). component=arangodb 2018-07-18T10:43:41+02:00 |INFO| Looking for a running instance of coordinator on port 8569 component=arangodb 2018-07-18T10:43:41+02:00 |INFO| Starting coordinator on port 8569 component=arangodb 2018-07-18T10:43:41+02:00 |INFO| dbserver up and running (version 3.2.16). component=arangodb 2018-07-18T10:43:42+02:00 |INFO| coordinator up and running (version 3.2.16). component=arangodb 2018-07-18T10:43:42+02:00 |INFO| Your cluster can now be accessed with a browser at
http://localhost:8569or component=arangodb 2018-07-18T10:43:42+02:00 |INFO| usingarangosh --server.endpoint tcp://localhost:8569. component=arangodb 2018-07-18T10:43:42+02:00 |INFO| Removed RECOVERY file. component=arangodb 2018-07-18T10:43:42+02:00 |INFO| Most likely there is now an extra coordinator & dbserver in FAILED state. Remove them manually using the web UI. component=arangodb
the correct arangodb port is shown when the recovery is started 2018-07-18T10:43:39+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb
the wrong arangodb port is shown when the new starter is initialized (8548 shown, used port is 8568) 2018-07-18T10:43:39+02:00 |INFO| ArangoDB Starter listening on 0.0.0.0:8548 (:8548) component=arangodb
the use of port 8568 is expected as it has the default offset of coordinator (+1) dbserver (+2) agent (+3)
when shutting down the recovered node and doing another recovery process (new directory db5) with port 8548 in the recovery file, the used port 8568 is shown in the error message
/usr/bin/arangodb --starter.data-dir=/home/max/Documents/starter/db5 --starter.join 127.0.0.1 2018-07-18T11:04:56+02:00 |INFO| Starting arangodb version 0.12.0, build cd81a60 component=arangodb 2018-07-18T11:04:56+02:00 |INFO| Trying to recover as starter localhost:8548 component=arangodb 2018-07-18T11:04:56+02:00 |ERRO| Cannot find a peer in cluster configuration for address localhost with port 8548 component=arangodb 2018-07-18T11:04:56+02:00 |INFO| Starters found are: localhost:8528, localhost:8538, localhost:8568 component=arangodb 2018-07-18T11:04:56+02:00 |FATA| Failed to recover component=arangodb error="No peer found for localhost:8548"
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/arangodb-helper/arangodb/issues/184, or mute the thread https://github.com/notifications/unsubscribe-auth/AEh-LIXJZdkLSnaM-JNN-Aq2N2u7-e_Yks5uH1dzgaJpZM4VU2HM .