gocql icon indicating copy to clipboard operation
gocql copied to clipboard

Re-create connection picker on shard count change

Open martin-sucha opened this issue 4 years ago • 4 comments

We have seen the following panic

panic: scylla: 10.127.248.9:9042 invalid number of shards

goroutine 43250910 [running]:
github.com/gocql/gocql.(*scyllaConnPicker).Put(0xc009dde630, 0xc06f5dd520)
	/go/pkg/mod/github.com/kiwicom/[email protected]/scylla.go:340 +0x42e
github.com/gocql/gocql.(*hostConnPool).connect(0xc072c16980)
	/go/pkg/mod/github.com/kiwicom/[email protected]/connectionpool.go:539 +0x2f0
github.com/gocql/gocql.(*hostConnPool).fill(0xc072c16980)
	/go/pkg/mod/github.com/kiwicom/[email protected]/connectionpool.go:390 +0x17c
github.com/gocql/gocql.(*policyConnPool).addHost(0xc000b574a0, 0xc0acaa0d00)
	/go/pkg/mod/github.com/kiwicom/[email protected]/connectionpool.go:238 +0x10f
github.com/gocql/gocql.(*Session).startPoolFill(0xc000713000, 0xc0acaa0d00)
	/go/pkg/mod/github.com/kiwicom/[email protected]/events.go:277 +0x2d
github.com/gocql/gocql.(*Session).addNewNode(0xc000713000, {0xc07b470c80, 0x4, 0x4}, 0xc000a91ea8)
	/go/pkg/mod/github.com/kiwicom/[email protected]/events.go:202 +0xe7
github.com/gocql/gocql.(*Session).handleNewNode(0xc000713000, {0xc07b470c80, 0xc0ac04c350, 0xc}, 0x6)
	/go/pkg/mod/github.com/kiwicom/[email protected]/events.go:224 +0x99
github.com/gocql/gocql.(*Session).handleNodeEvent(0x100000000000000, {0xc0b47de000, 0x2, 0xc0abc41b38})
	/go/pkg/mod/github.com/kiwicom/[email protected]/events.go:169 +0x1b3
created by github.com/gocql/gocql.(*eventDebouncer).flush
	/go/pkg/mod/github.com/kiwicom/[email protected]/events.go:67 +0xb5

when we replaced a server node with a new one with a different CPU core count, but the same IP address as the old node had.

martin-sucha avatar Oct 07 '21 16:10 martin-sucha

I haven't tried compiling or running this code yet. I'd like to start discussion about possible solutions.

martin-sucha avatar Oct 07 '21 17:10 martin-sucha

I'm not sure if that would solve the problem.

mmatczuk avatar Oct 08 '21 07:10 mmatczuk

I'm not sure if that would solve the problem.

@mmatczuk Why? Could you please elaborate on what issues do you see with this code?

Do you have some alternatives in mind?

martin-sucha avatar Oct 11 '21 11:10 martin-sucha

@martin-sucha shouldn't this work the way that driver will refresh metadata about node? to me this looks like the metadata refresh is not working (I filed a similar bug re down nodes or joining nodes that don't get properly marked in topology in client and it tries to connect to them while they are down)

tarzanek avatar Dec 13 '23 14:12 tarzanek