featurebase icon indicating copy to clipboard operation
featurebase copied to clipboard

Pilosa fails during cluster join process if another node initiates join

Open dene14 opened this issue 7 years ago • 9 comments

  • Cluster set of 3 nodes
  • replication factor 3
  • cluster is empty (no indexes created)
  • reproducible in 99% of attempts.

Let me know what else you need to investigate.

2018/10/11 13:24:34 load NodeID: /data/pilosa/.id
2018/10/11 13:24:34 add node Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a to cluster on Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a
2018/10/11 13:24:34 [DEBUG] memberlist: Stream connection from=10.239.5.194:44388
2018/10/11 13:24:34 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.5.194:11101
2018/10/11 13:24:34 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.3.156:11101
2018/10/11 13:24:34 open server
2018/10/11 13:24:34 SendSync to: http://dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2018/10/11 13:24:34 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a wait for joining to complete
2018/10/11 13:24:34 monitor primary store events
2018/10/11 13:24:34 received NodeJoin event: &{0 Node: 923ada43-1d6a-4f2a-a477-19bffdaa9fbf}
2018/10/11 13:24:38 merge cluster status: &{5246c7a7-2127-4700-829d-1883f4613960 NORMAL [Node: 923ada43-1d6a-4f2a-a477-19bffdaa9fbf Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a Node: fc74f2de-7555-4581-bf25-a196bcd3f390]}
2018/10/11 13:24:38 add node Node: 923ada43-1d6a-4f2a-a477-19bffdaa9fbf to cluster on Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a
2018/10/11 13:24:38 add node Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a to cluster on Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a
2018/10/11 13:24:38 add node Node: fc74f2de-7555-4581-bf25-a196bcd3f390 to cluster on Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a
2018/10/11 13:24:38 change cluster state from STARTING to NORMAL on 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a
2018/10/11 13:24:38 mark node as joined (received coordinator update)
2018/10/11 13:24:38 stop monitor replication
2018/10/11 13:24:38 set primary translate store to 923ada43-1d6a-4f2a-a477-19bffdaa9fbf
2018/10/11 13:24:38 start monitor replication
2018/10/11 13:24:38 pilosa: replicating from offset 0
2018/10/11 13:24:38 joining has completed
2018/10/11 13:24:38 open holder path: /data/pilosa
2018/10/11 13:24:38 open holder: complete
2018/10/11 13:24:38 Sending State READY (923ada43-1d6a-4f2a-a477-19bffdaa9fbf)
2018/10/11 13:24:38 SendTo: http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2018/10/11 13:24:38 merge cluster status: &{5246c7a7-2127-4700-829d-1883f4613960 STARTING [Node: 923ada43-1d6a-4f2a-a477-19bffdaa9fbf Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a Node: fc74f2de-7555-4581-bf25-a196bcd3f390]}
2018/10/11 13:24:38 add node Node: 923ada43-1d6a-4f2a-a477-19bffdaa9fbf to cluster on Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a
2018/10/11 13:24:38 add node Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a to cluster on Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a
2018/10/11 13:24:38 add node Node: fc74f2de-7555-4581-bf25-a196bcd3f390 to cluster on Node: 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a
2018/10/11 13:24:38 change cluster state from NORMAL to STARTING on 99d1ce3b-8d7f-41d0-980f-e6bbdda9061a
2018/10/11 13:24:38 mark node as joined (received coordinator update)
Error: running server: opening server: setting nodeState: sending node state error: err=sending: unexpected response status code: 400: receiving message: executing http request: Post http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101/internal/cluster/message: dial tcp: lookup dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
{}

running server: opening server: setting nodeState: sending node state error: err=sending: unexpected response status code: 400: receiving message: executing http request: Post http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101/internal/cluster/message: dial tcp: lookup dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
{}

Usage:
  pilosa server [flags]

Flags:
      --anti-entropy.interval duration       Interval at which to run anti-entropy routine. (default 10m0s)
  -b, --bind string                          Default URI on which pilosa should listen. (default ":10101")

dene14 avatar Oct 11 '18 13:10 dene14

If you could post the configuration for each node, and perhaps the order in which they are being started, we will attempt to reproduce.

jaffee avatar Oct 11 '18 16:10 jaffee

All settings are set with ENV variables.

export PILOSA_BIND='10.239.41.59:10101'
export PILOSA_CLUSTER_REPLICAS='3'
export PILOSA_DATA_DIR='/data/pilosa'
export PILOSA_GOSSIP_PORT='11101'
export PILOSA_GOSSIP_SEEDS='dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101,dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101,dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101'
export PILOSA_VERBOSE='true'

Coordinator node additionally has:

export PILOSA_CLUSTER_COORDINATOR='true'

dev-pilosa-0 has a role of coordinator. I'm starting from 0 to 2 and in reverse order.

dene14 avatar Oct 11 '18 16:10 dene14

possible that this was fixed by #1717 - need to verify

jaffee avatar Nov 26 '18 16:11 jaffee

@dene14 are you still seeing this issue with the latest Pilosa (master or 1.2)?

jaffee avatar Jan 09 '19 16:01 jaffee

@jaffee I've tested my configuration with pilosa/pilosa:v1.2.0 image today. Coordinator just fails to start:

+ '[' 0 '==' 0 ]
+ export 'PILOSA_CLUSTER_COORDINATOR=true'
+ /pilosa server
2019/01/18 16:21:57 Pilosa v1.2.0, build time 2018-12-20T17:54:50+0000
2019/01/18 16:21:57 load NodeID: /data/pilosa/.id
2019/01/18 16:21:57 [DEBUG] memberlist: Stream connection from=10.239.15.114:49478
2019/01/18 16:21:57 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.15.114:11101
2019/01/18 16:21:57 [WARN] memberlist: Failed to resolve dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101: lookup dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
2019/01/18 16:21:57 [WARN] memberlist: Failed to resolve dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101: lookup dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
2019/01/18 16:21:57 open server
2019/01/18 16:21:57 open holder path: /data/pilosa
2019/01/18 16:21:57 open holder: complete
2019/01/18 16:21:57 received state READY (1398c1eb-e0ce-41ad-94e0-63bae857a717)
2019/01/18 16:21:57 change cluster state from STARTING to DEGRADED on 1398c1eb-e0ce-41ad-94e0-63bae857a717
2019/01/18 16:21:57 listening as http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2019/01/18 16:21:57 holder sync monitor initializing (10m0s interval)
2019/01/18 16:21:57 diagnostics disabled
2019/01/18 16:22:20 [DEBUG] memberlist: Stream connection from=10.239.41.104:39062
2019/01/18 16:22:20 nodeJoin of http://dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101 on http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2019/01/18 16:22:20 node join event on coordinator, node: http://dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101, id: a5163467-4771-49a6-9fef-e1dce6abf1bc
2019/01/18 16:22:20 host is not in topology: a5163467-4771-49a6-9fef-e1dce6abf1bc
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5ec347]

goroutine 22 [running]:
log.(*Logger).Output(0x0, 0x2, 0xc0003e8120, 0x101, 0x0, 0x0)
	/usr/local/go/src/log/log.go:153 +0x47
log.(*Logger).Printf(0x0, 0xd466b6, 0x17, 0xc000153fa8, 0x1, 0x1)
	/usr/local/go/src/log/log.go:179 +0x7e
github.com/pilosa/pilosa/gossip.(*eventReceiver).listen(0xc00028bb20)
	/go/src/github.com/pilosa/pilosa/gossip/gossip.go:392 +0x2a0
created by github.com/pilosa/pilosa/gossip.newEventReceiver
	/go/src/github.com/pilosa/pilosa/gossip/gossip.go:331 +0x9e```

I haven't changed anything in terms of configuration, but v1.1.0 is passing through at least initial cluster spin up just fine. Am I missing some changes in configuration parameters?

dene14 avatar Jan 18 '19 16:01 dene14

that looks like we aren't setting up the logger properly - I'll debug that.

as a workaround, you could try setting a log path in your configuration PILOSA_LOG_PATH="/path/to/logfile" That might be able to bypass this problem.

jaffee avatar Jan 18 '19 16:01 jaffee

@jaffee then it dies silently... In the log I have only this part:

2019/01/18 17:06:46 [WARN] memberlist: Failed to resolve dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101: lookup dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
2019/01/18 17:06:46 [WARN] memberlist: Failed to resolve dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101: lookup dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
2019/01/18 17:06:46 open server
2019/01/18 17:06:46 open holder path: /data/pilosa
2019/01/18 17:06:46 open holder: complete
2019/01/18 17:06:46 received state READY (7d262bf0-423a-4ac5-a903-2cfb2d109108)
2019/01/18 17:06:46 change cluster state from STARTING to NORMAL on 7d262bf0-423a-4ac5-a903-2cfb2d109108
2019/01/18 17:06:46 listening as http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2019/01/18 17:06:46 diagnostics disabled
2019/01/18 17:06:46 holder sync monitor initializing (10m0s interval)


command terminated with exit code 137

(Last line from docker, ignore it)

dene14 avatar Jan 18 '19 17:01 dene14

@dene14 You can try a build from https://github.com/pilosa/pilosa/pull/1835

You must be having some issue beyond that though, because the line that is being logged is when Pilosa fails to receive an internal cluster message for some reason. This fix will unmask that error so we can see what it is.

jaffee avatar Jan 18 '19 20:01 jaffee

@jaffee now that you've merged it into master, I was able to test. It's definitely better (cannot say how much better yet). At least cluster is able to start from scratch and adjoin nodes, though it reports STARTING stage

Noticed one minor thing in a following chain of actions (log of docker containers state changes):

dev-pilosa-0   0/1   Pending   0     0s
dev-pilosa-0   0/1   Pending   0     0s
dev-pilosa-0   0/1   ContainerCreating   0     0s
dev-pilosa-0   0/1   Running   0     10s
dev-pilosa-0   1/1   Running   0     72s
dev-pilosa-1   0/1   Pending   0     0s
dev-pilosa-1   0/1   Pending   0     0s
dev-pilosa-1   0/1   ContainerCreating   0     0s
dev-pilosa-1   0/1   Running   0     10s
dev-pilosa-1   1/1   Running   0     78s
dev-pilosa-2   0/1   Pending   0     0s
dev-pilosa-2   0/1   Pending   0     0s
dev-pilosa-2   0/1   ContainerCreating   0     0s
dev-pilosa-1   0/1   Error   0     88s
dev-pilosa-2   0/1   Running   0     10s
dev-pilosa-1   0/1   Running   1     89s
dev-pilosa-1   1/1   Running   1     2m28s
dev-pilosa-2   1/1   Running   0     72s

As you may see, dev-pilosa-1 failed as soon as node dev-pilosa-2 advertised itself. Seems advertisement sent too early and listening socket is not ready yet. Logs from dev-pilosa-1:

+ /pilosa server
2019/01/23 22:35:56 Pilosa v1.2.0-79-gb86f0c67, build time 2019-01-21T22:48:42+0000
2019/01/23 22:35:56 load NodeID: /data/pilosa/.id
2019/01/23 22:35:56 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.39.48:11101
2019/01/23 22:35:56 nodeJoin of http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101 on http://dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2019/01/23 22:35:56 [DEBUG] memberlist: Stream connection from=10.239.2.0:46396
2019/01/23 22:35:56 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.2.0:11101
2019/01/23 22:35:56 [WARN] memberlist: Failed to resolve dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101: lookup dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
2019/01/23 22:35:56 open server
2019/01/23 22:35:56 6d32e594-b3af-4da5-bb22-a1bbb2c2e2a3 wait for joining to complete
2019/01/23 22:36:00 [DEBUG] memberlist: Stream connection from=10.239.39.48:41962
2019/01/23 22:36:30 [DEBUG] memberlist: Stream connection from=10.239.39.48:42282
2019/01/23 22:36:37 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.39.48:11101
2019/01/23 22:37:00 [DEBUG] memberlist: Stream connection from=10.239.39.48:42536
2019/01/23 22:37:07 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.39.48:11101
2019/01/23 22:37:13 merge cluster status: &{971145dd-ed0a-45f1-933b-15cc6d31a625 NORMAL [Node:http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101:READY:1975c61e-86ed-4ba0-b9c6-447b4c704cd7 Node:http://dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101:DOWN:6d32e594-b3af-4da5-bb22-a1bbb2c2e2a3 Node:http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101:DOWN:876bacb9-c255-4d12-90c3-0f6967f5a1ba]}
2019/01/23 22:37:13 change cluster state from STARTING to NORMAL on 6d32e594-b3af-4da5-bb22-a1bbb2c2e2a3
2019/01/23 22:37:13 joining has completed
2019/01/23 22:37:13 open holder path: /data/pilosa
2019/01/23 22:37:13 set primary translate store to 1975c61e-86ed-4ba0-b9c6-447b4c704cd7
2019/01/23 22:37:13 pilosa: replicating from offset 0
2019/01/23 22:37:13 open holder: complete
2019/01/23 22:37:13 sending state READY (1975c61e-86ed-4ba0-b9c6-447b4c704cd7)
2019/01/23 22:37:13 [DEBUG] memberlist: Stream connection from=10.239.23.84:60286
2019/01/23 22:37:13 nodeJoin of http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101 on http://dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2019/01/23 22:37:13 merge cluster status: &{971145dd-ed0a-45f1-933b-15cc6d31a625 STARTING [Node:http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101:READY:1975c61e-86ed-4ba0-b9c6-447b4c704cd7 Node:http://dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101:READY:6d32e594-b3af-4da5-bb22-a1bbb2c2e2a3 Node:http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101:DOWN:876bacb9-c255-4d12-90c3-0f6967f5a1ba]}
2019/01/23 22:37:13 change cluster state from NORMAL to STARTING on 6d32e594-b3af-4da5-bb22-a1bbb2c2e2a3
Error: running server: opening server: setting nodeState: sending node state error: err=sending: executing request: server error 400 Bad Request: 'receiving message: executing request: executing request: Post http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101/internal/cluster/message: dial tcp: lookup dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
{}
'
Usage:
  pilosa server [flags]

Flags:
      --anti-entropy.interval duration       Interval at which to run anti-entropy routine. (default 10m0s)
  -b, --bind string                          Default URI on which pilosa should listen. (default ":10101")
      --cluster.coordinator                  Host that will act as cluster coordinator during startup and resizing.
      --cluster.disabled                     Disabled multi-node cluster communication (used for testing)
      --cluster.hosts strings                Comma separated list of hosts in cluster. Only used for testing.
      --cluster.long-query-time duration     Duration that will trigger log and stat messages for slow queries. (default 1m0s)
      --cluster.replicas int                 Number of hosts each piece of data should be stored on. (default 1)
  -d, --data-dir string                      Directory to store pilosa data files. (default "~/.pilosa")
      --gossip.interval duration             Interval between sending messages that need to be gossiped that haven't piggybacked on probing messages. (default 200ms)
      --gossip.key string                    The path to file of the encryption key for gossip. The contents of the file should be either 16, 24, or 32 bytes to select AES-128, AES-192, or AES-256.
      --gossip.nodes int                     Number of random nodes to send gossip messages to per GossipInterval. (default 3)
      --gossip.port string                   Port to which pilosa should bind for internal state sharing. (default "14000")
      --gossip.probe-interval duration       Interval between random node probes. (default 1s)
      --gossip.probe-timeout duration        Timeout to wait for an ack from a probed node before assuming it is unhealthy. (default 500ms)
      --gossip.push-pull-interval duration   Interval between complete state syncs. (default 30s)
      --gossip.seeds strings                 Host with which to seed the gossip membership.
      --gossip.stream-timeout duration       Timeout for establishing a stream connection with a remote node for a full state sync. (default 10s)
      --gossip.suspicion-mult int            Multiplier for determining the time an inaccessible node is considered suspect before declaring it dead. (default 4)
      --gossip.to-the-dead-time duration     Interval after which a node has died that we will still try to gossip to it. (default 30s)
      --handler.allowed-origins strings      Comma separated list of allowed origin URIs (for CORS/WebUI).
  -h, --help                                 help for server
      --log-path string                      Log path
      --max-writes-per-request int           Number of write commands per request. (default 5000)
      --metric.diagnostics                   Enabled diagnostics reporting. (default true)
      --metric.host string                   Default URI to send metrics.
      --metric.poll-interval duration        Polling interval metrics.
      --metric.service string                Default URI on which pilosa should listen. (default "none")
      --tls.certificate string               TLS certificate path (usually has the .crt or .pem extension
      --tls.key string                       TLS certificate key path (usually has the .key extension
      --tls.skip-verify                      Skip TLS certificate verification (not secure)
      --tracing.agent-host-port string       Jaeger agent host:port.
      --tracing.sampler-param float          Jaeger sampler parameter. (default 0.001)
      --tracing.sampler-type string          Jaeger sampler type. (default "remote")
      --translation.map-size int             Size in bytes of mmap to allocate for key translation.
      --translation.primary-url string       DEPRECATED: URL for primary translation node for replication.
      --verbose                              Enable verbose logging

Global Flags:
  -c, --config string   Configuration file to read from.

running server: opening server: setting nodeState: sending node state error: err=sending: executing request: server error 400 Bad Request: 'receiving message: executing request: executing request: Post http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101/internal/cluster/message: dial tcp: lookup dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
{}
'

Same log for initial run from coordinator node:

+ export 'PILOSA_CLUSTER_COORDINATOR=true'
+ /pilosa server
2019/01/23 22:34:44 Pilosa v1.2.0-79-gb86f0c67, build time 2019-01-21T22:48:42+0000
2019/01/23 22:34:44 load NodeID: /data/pilosa/.id
2019/01/23 22:34:44 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.39.48:11101
2019/01/23 22:34:44 [DEBUG] memberlist: Stream connection from=10.239.39.48:56838
2019/01/23 22:34:44 [WARN] memberlist: Failed to resolve dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101: lookup dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
2019/01/23 22:34:44 [WARN] memberlist: Failed to resolve dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:11101: lookup dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
2019/01/23 22:34:44 open server
2019/01/23 22:34:44 open holder path: /data/pilosa
2019/01/23 22:34:44 open holder: complete
2019/01/23 22:34:44 received state READY (1975c61e-86ed-4ba0-b9c6-447b4c704cd7)
2019/01/23 22:34:44 change cluster state from STARTING to DEGRADED on 1975c61e-86ed-4ba0-b9c6-447b4c704cd7
2019/01/23 22:34:44 listening as http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2019/01/23 22:34:44 diagnostics disabled
2019/01/23 22:34:44 holder sync monitor initializing (10m0s interval)
2019/01/23 22:35:56 [DEBUG] memberlist: Stream connection from=10.239.2.0:60690
2019/01/23 22:35:56 nodeJoin of http://dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101 on http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2019/01/23 22:35:56 node join event on coordinator, node: http://dev-pilosa-1.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101, id: 6d32e594-b3af-4da5-bb22-a1bbb2c2e2a3
2019/01/23 22:36:00 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.2.0:11101
2019/01/23 22:36:30 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.2.0:11101
2019/01/23 22:36:37 [DEBUG] memberlist: Stream connection from=10.239.2.0:33036
2019/01/23 22:37:00 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.2.0:11101
2019/01/23 22:37:07 [DEBUG] memberlist: Stream connection from=10.239.2.0:33426
2019/01/23 22:37:13 [DEBUG] memberlist: Stream connection from=10.239.23.84:58658
2019/01/23 22:37:13 nodeJoin of http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101 on http://dev-pilosa-0.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101
2019/01/23 22:37:13 node join event on coordinator, node: http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101, id: 876bacb9-c255-4d12-90c3-0f6967f5a1ba
2019/01/23 22:37:13 change cluster state from DEGRADED to NORMAL on 1975c61e-86ed-4ba0-b9c6-447b4c704cd7
2019/01/23 22:37:13 receive event error: receiving message: cluster receiving NodeEvent &{0 Node:http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101:DOWN:876bacb9-c255-4d12-90c3-0f6967f5a1ba}: executing request: executing request: Post http://dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local:10101/internal/cluster/message: dial tcp: lookup dev-pilosa-2.dev-pilosa-headless.dev-pilosa.svc.cluster.local on 10.239.0.10:53: no such host
2019/01/23 22:37:13 received state READY (6d32e594-b3af-4da5-bb22-a1bbb2c2e2a3)
2019/01/23 22:37:13 change cluster state from NORMAL to STARTING on 1975c61e-86ed-4ba0-b9c6-447b4c704cd7
2019/01/23 22:37:13 http: translate store read error: context canceled
2019/01/23 22:37:14 [DEBUG] memberlist: Stream connection from=10.239.2.0:33500
2019/01/23 22:37:30 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.23.84:11101
2019/01/23 22:37:51 [DEBUG] memberlist: Stream connection from=10.239.23.84:59168
2019/01/23 22:38:00 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.23.84:11101
2019/01/23 22:38:30 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.23.84:11101
2019/01/23 22:39:00 [DEBUG] memberlist: Initiating push/pull sync with: 10.239.23.84:11101

dene14 avatar Jan 23 '19 22:01 dene14