arangodb
arangodb copied to clipboard
Cannot start because of error from master - connection refused?
While Santo and I did some simple educational deployment experiments with the starter, we ran into a situation in which it was not possible to join the cluster anymore:
C:\Demo>ArangoDB\usr\bin\arangodb --starter.data-dir=DataArango4 --log.dir=Log4 --starter.join=127.0.0.1
2018/06/19 17:40:51 Starting arangodb version 0.11.3, build 5c2faa9
2018/06/19 17:40:51 Contacting master http://127.0.0.1:8528...
2018/06/19 17:40:52 Cannot start because of error from master: Post http://127.0.0.1:8528/hello: dial tcp 127.0.0.1:8528: connectex: Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte.
The last line reads "couldn't established connection because remote computer refused it"
We used ArangoDB 3.3.10 under Windows 10 Pro, single machine. There were 3 nodes initially, everything worked fine. Then we took down the first node (which we started without --starter.join 127.0.0.1). Still no problems with ArangoDB, but when we tried to bring this node up again, it couldn't contact the starter master it seems. (I believe I ran just the starter without any parameters command at some point, but killed the process immediately and tried something else - not sure if that could make a difference.)
Under what circumstances would this be expected? I also tried to add a 4th new node, but it had the same problem. Then I took down all nodes and brought all 4 up again and it worked fine (each with --starter.join 127.0.0.1).
Still not sure how to reproduce this exactly, but it just happened again. Then I shutdown everything and tried again, but the 4th node gave me this:
2018/06/19 19:21:56 Starting arangodb version 0.11.3, build 5c2faa9
2018/06/19 19:21:56 Contacting master http://127.0.0.1:8528...
2018/06/19 19:21:56 Cannot start because of HTTP error from master: code=400, message=Cannot use same directory as peer.: bad request
@Simran-B can you explain exactly what you did to "we took down the first node"
Basically spamming Ctrl+C in the terminal of the first node to terminate the starter forcefully.
In cases like this, please provide ALL starter logs.
I'll try to reproduce the problem with a fresh environment and provide all logs.