arangodb icon indicating copy to clipboard operation
arangodb copied to clipboard

Cannot start because of error from master - connection refused?

Open Simran-B opened this issue 7 years ago • 5 comments
trafficstars

While Santo and I did some simple educational deployment experiments with the starter, we ran into a situation in which it was not possible to join the cluster anymore:

C:\Demo>ArangoDB\usr\bin\arangodb --starter.data-dir=DataArango4 --log.dir=Log4 --starter.join=127.0.0.1
2018/06/19 17:40:51 Starting arangodb version 0.11.3, build 5c2faa9
2018/06/19 17:40:51 Contacting master http://127.0.0.1:8528...
2018/06/19 17:40:52 Cannot start because of error from master: Post http://127.0.0.1:8528/hello: dial tcp 127.0.0.1:8528: connectex: Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte.

The last line reads "couldn't established connection because remote computer refused it"

We used ArangoDB 3.3.10 under Windows 10 Pro, single machine. There were 3 nodes initially, everything worked fine. Then we took down the first node (which we started without --starter.join 127.0.0.1). Still no problems with ArangoDB, but when we tried to bring this node up again, it couldn't contact the starter master it seems. (I believe I ran just the starter without any parameters command at some point, but killed the process immediately and tried something else - not sure if that could make a difference.)

Under what circumstances would this be expected? I also tried to add a 4th new node, but it had the same problem. Then I took down all nodes and brought all 4 up again and it worked fine (each with --starter.join 127.0.0.1).

Simran-B avatar Jun 19 '18 16:06 Simran-B

Still not sure how to reproduce this exactly, but it just happened again. Then I shutdown everything and tried again, but the 4th node gave me this:

2018/06/19 19:21:56 Starting arangodb version 0.11.3, build 5c2faa9
2018/06/19 19:21:56 Contacting master http://127.0.0.1:8528...
2018/06/19 19:21:56 Cannot start because of HTTP error from master: code=400, message=Cannot use same directory as peer.: bad request

Simran-B avatar Jun 19 '18 17:06 Simran-B

@Simran-B can you explain exactly what you did to "we took down the first node"

ewoutp avatar Jun 22 '18 11:06 ewoutp

Basically spamming Ctrl+C in the terminal of the first node to terminate the starter forcefully.

Simran-B avatar Jun 22 '18 11:06 Simran-B

In cases like this, please provide ALL starter logs.

ewoutp avatar Jun 25 '18 06:06 ewoutp

I'll try to reproduce the problem with a fresh environment and provide all logs.

Simran-B avatar Jun 25 '18 07:06 Simran-B