`SelectRandom` panics on single-node deployment.
The function that should return client for one of the cluster members panics if the deployment has only one member.
It seems that on single-node deployments, the size of the "cluster" (c) is 0, causing it to call rand.Intn function with an invalid argument 0.
Stacktrace:
2025/07/14 15:40:51 http: panic serving @: invalid argument to Intn
goroutine 155 [running]:
net/http.(*conn).serve.func1()
/snap/go/10828/src/net/http/server.go:1903 +0xbe
panic({0x9d55c0?, 0xbafdb0?})
/snap/go/10828/src/runtime/panic.go:770 +0x132
math/rand.(*Rand).Intn(0xab45f0?, 0xc000444660?)
/snap/go/10828/src/math/rand/rand.go:180 +0x4c
math/rand.Intn(0x0)
/snap/go/10828/src/math/rand/rand.go:453 +0x25
github.com/canonical/microcluster/v2/client.Cluster.SelectRandom({0xff8980, 0x0, 0x140?})
/root/go/pkg/mod/github.com/canonical/microcluster/[email protected]/client/cluster.go:14 +0x5b
I don't have live code with this bug anywhere, but this snippet roughly shows the reproducer.
func ovnCentralIpsUpdated(ctx context.Context, s state.State, key string, _ string) error {
errMsgPrefix := fmt.Sprintf("handling of '%s' config failed.", key)
cluster, err := s.Cluster(false)
if err != nil {
logger.Errorf("failed to trigger OVN environment refresh. %v", err)
errMsg := fmt.Sprintf(
"%s Failed to trigger environment update in the OVN cluster. "+
"Config value was not successfully applied. Please see logs for more details.",
errMsgPrefix,
)
return fmt.Errorf("%s", errMsg)
}
client := cluster.SelectRandom()
...
Hi, thanks for raising awareness on this one!
There are two more SelectRandom funcs which seem to use the same type of logic.
But what surprises me is that none of those three functions is actually used within Microcluster itself.
The one from the internal/trust package we can safely remove in v2, but for the other two we should provide an alternative implementation which doesn't panic.
For v3 I would argue we can delete all three of them, or do you want to consume such type of logic from Microcluster? Because in this case we might keep it.
@mkalcok have you recently started using this func? Because in all of our single node MicroCloud tests I have never encountered this issue.
For v3 I would argue we can delete all three of them, or do you want to consume such type of logic from Microcluster? Because in this case we might keep it.
I ended up not using it, for obvious reasons 😆. I was in a situation where, during the handling of a client request on the server side, I needed to send out another request (didn't matter to which cluster node). This seemed like a convenient way to get client. I ended up using State.Leader() instead.
@mkalcok have you recently started using this func? Because in all of our single node MicroCloud tests I have never encountered this issue.
Yes, I was just trying it out as a part of implementing a new feature in MicroOVN. I never used it before.
I see thanks. It feels like a good addition then to have a utility like this. Also as it's already in v2 we add the fix to both v3 and v2 so you could potentially start using it afterwards.
The issue is now marked as bug as we clearly need to address the panic. In addition the SelectRandom on the internal/trust package should be removed from both v2 and v3 and the third SelectRandom in rest/types should receive the same fix but can be removed only in v3 to not break v2.
I'll pick this one up next pulse :)