stolon icon indicating copy to clipboard operation
stolon copied to clipboard

Decreasing log verbosity

Open nh2 opened this issue 6 years ago • 5 comments

My stolon cluster is within the same data center and I'd like failovers to be detected and performed immediately, so I have set:

  • sleepInterval "1s"
  • requestTimeout "1s"
  • failInterval "1s"

Seems to be working great but now I get very frequent logs:

Jul 31 21:01:32 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:32Z keeper.go:1269: our db requested role is standby followedDB=da32b492
Jul 31 21:01:32 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:32Z keeper.go:1288: already standby
Jul 31 21:01:32 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:32Z keeper.go:1410: postgres parameters not changed
Jul 31 21:01:33 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:33Z keeper.go:1269: our db requested role is standby followedDB=da32b492
Jul 31 21:01:33 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:33Z keeper.go:1288: already standby
Jul 31 21:01:33 node-2 stolon-keeper[1123]: [I] 2017-07-31T21:01:33Z keeper.go:1410: postgres parameters not changed

Is it possible to decrease the log frequency of these "everything is OK" messages without increasing the latency for failovers?

For example, it'd be great if I can configure that it prints only once per minute that everything is still OK.

If not, I'd like to feature-request that.

nh2 avatar Jul 31 '17 21:07 nh2

@nh2

The logs are printed at every check so cannot be changed to be printed only at specific intervals without adding more code. They could be moved to a debug level but I'm on the fence on doing this since they can be useful to see that the keeper is operating and the current requested state and they are useful when a state change is going to happen.

As a note:

so I have set:

sleepInterval "1s" requestTimeout "1s" failInterval "1s"

I won't suggest using these values, failinterval should be at least 3 times sleepinterval or just one slow update from the keeper to etcd (for various reason) will trigger the failover. Currently the proxy check interval is hardcoded to 5s and the proxy timeout to 15s so client will wait also 5 seconds between a failover before opening connections to the new master.

sgotti avatar Aug 08 '17 07:08 sgotti

They could be moved to a debug level but I'm on the fence on doing this since they can be useful to see that the keeper is operating and the current requested state and they are useful when a state change is going to happen.

I agree, they are useful, and I'd like to see them regularly, but at every check seems too much for my use case, because I literally have kilometers of only stolon "everything is OK" messages in the logs if I use fast intervals.

Currently the proxy check interval is hardcoded to 5s and the proxy timeout to 15s so client will wait also 5 seconds between a failover before opening connections to the new master.

Would it be possible to make these configurable as well? I have very reliable 0.2 ms latency in the location where I run stolon, and my application would benefit a lot from very fast failovers. Having a couple unnecessary failovers is OK in my use case (and I'd really like to test stolon failover a lot in that cluster so they'd even be beneficial).

nh2 avatar Aug 29 '17 13:08 nh2

I agree, they are useful, and I'd like to see them regularly, but at every check seems too much for my use case, because I literally have kilometers of only stolon "everything is OK" messages in the logs if I use fast intervals.

The faster way, since logs shouldn't be used to monitor application health, will be to move them to a debug level.

Would it be possible to make these configurable as well? I have very reliable 0.2 ms latency in the location where I run stolon, and my application would benefit a lot from very fast failovers. Having a couple unnecessary failovers is OK in my use case (and I'd really like to test stolon failover a lot in that cluster so they'd even be beneficial).

There's already an open issue #314 . For making them configurable we should also add some validation to make the timeout at least 3*interval. But if your store becomes slow or there's a store master election (a store member dies) it could take a time greater than the timeout and the proxy will timeout closing all the connections.

sgotti avatar Aug 30 '17 17:08 sgotti

We would also love to see this configurable. Currently we are using skaffold for development that merges all logs from all pods in one terminal output and so much scrolling makes it unusable pretty much. Would be great if repeating messages could be moved to DEBUG level instead.

In the meantime, I am decreasing verbosity by setting PG log level to "FATAL" in development during Stolon cluster init, which is less than ideal, of course:

kubectl run -i -t stolonctl --image=sorintlab/stolon:v0.13.0-pg11 --restart=Never \
  --rm -- /usr/local/bin/stolonctl --cluster-name=kube-stolon --store-backend=kubernetes \
  --kube-resource-kind=configmap init -y '{"initMode": "new", "pgParameters": {"log_min_messages": "FATAL"}}'

demisx avatar Apr 24 '19 14:04 demisx

Debug is ok, but I think it should logging only on status changed.

icefed avatar Jun 25 '19 06:06 icefed