fdb-kubernetes-operator icon indicating copy to clipboard operation
fdb-kubernetes-operator copied to clipboard

Setting log group in trace logs from CLI commands for the operator

Open brownleej opened this issue 4 years ago • 12 comments

We set a log group environment variable in the docker image. This sets a field in the trace events that makes it easier to find events for the operator when the logs are aggregated in another system. This applies to the trace logs from the client, but not to the trace logs for the CLI. I think that we should add the log group to the CLI commands as well.

brownleej avatar Feb 07 '21 00:02 brownleej

I think if we solve https://github.com/FoundationDB/fdb-kubernetes-operator/issues/245 also solve this issue right?

johscheuer avatar Feb 17 '21 07:02 johscheuer

Not quite. I would interpret #245 as being about the FDB pods that the operator launches, whereas this would be about the CLI commands run within the operator pod itself.

brownleej avatar Feb 17 '21 16:02 brownleej

Ah right, I mixed them both up 👍

johscheuer avatar Feb 18 '21 06:02 johscheuer

CC @ltsampros I know that you look at this, would that be a good candidate for you to work on?

johscheuer avatar Dec 10 '21 08:12 johscheuer

@brownleej was the idea to add the log group of the cluster when running a command? Otherwise the log group is already set the the env value in the container:

<Event Severity="10" Time="1642446065.863023" DateTime="2022-01-17T19:01:05Z" Type="PingLatency" ID="0000000000000000" Elapsed="15.0062" PeerAddr="10.42.0.19:4501" MinLatency="0.000265598" MaxLatency="0.000855923" MeanLatency="0.000470336" MedianLatency="0.000434637" P90Latency="0.000662088" Count="15" BytesReceived="1080" BytesSent="1080" ConnectOutgoingCount="0" ConnectIncomingCount="0" ConnectFailedCount="0" ConnectMinLatency="0" ConnectMaxLatency="0" ConnectMeanLatency="0" ConnectMedianLatency="0" ConnectP90Latency="0" Machine="10.42.0.20:1" LogGroup="fdb-kubernetes-operator" />

And that fit's the description: https://pkg.go.dev/os/exec#Cmd:

// Env specifies the environment of the process.
// Each entry is of the form "key=value".
// If Env is nil, the new process uses the current process's
// environment.
// If Env contains duplicate environment keys, only the last
// value in the slice for each duplicate key is used.
// As a special case on Windows, SYSTEMROOT is always added if
// missing and not explicitly set to the empty string.

johscheuer avatar Jan 17 '22 19:01 johscheuer

The idea is to pass the log group from the environment variable to the CLI commands, because I don't think they pick up the log group from the environment. I reproduced this locally, where I see the following trace event from a CLI command:

trace.10.1.15.185.51.1642545962.u4WGvV.0.1.xml:<Event Severity="10" Time="1642545962.462848" DateTime="2022-01-18T22:46:02Z" Type="CLIProgramStart" ID="0000000000000000" SourceVersion="a461b9c93be19f846c2c41d9de455f968b53fd6d" Version="6.3.10" PackageName="6.3" ActualTime="1642545962" ClusterFile="/tmp/1322091668" ConnectionString="test_cluster:[email protected]:4501,10.1.15.187:4501,10.1.15.192:4501" CommandLine="/usr/bin/fdb/6.3/fdbcli --exec configure new double ssd-2 usable_regions=1 logs=3 proxies=3 resolvers=1 log_routers=-1 remote_logs=-1 regions=[] -C /tmp/1322091668 --log --trace_format xml --timeout 10 --log-dir /var/log/fdb" Machine="10.1.15.185:51" LogGroup="default" TrackLatestType="Original" />

brownleej avatar Jan 18 '22 22:01 brownleej

Hm, that's interesting my test actually showed that the LogGroup was set but let me confirm that. In addition to that the env variable should already be passed down to fddbcli. By locally you mean that the operator is running in a container in a local Kubernetes cluster I assume and you verified that the LogGroup is set in the container as env variable and is not overwritten/removed somewhere/

johscheuer avatar Jan 19 '22 07:01 johscheuer

@johscheuer I came at the same conclusion as you when I looked at this last, that LogGroup is propagated through the env vars to the command executed and I'm fine as long as that is set. @johscheuer @brownleej Does it make sense to further amend this so that LogGroup contains also the cluster name?

ltsampros avatar Jan 19 '22 07:01 ltsampros

Hm, that's interesting my test actually showed that the LogGroup was set but let me confirm that. In addition to that the env variable should already be passed down to fddbcli. By locally you mean that the operator is running in a container in a local Kubernetes cluster I assume and you verified that the LogGroup is set in the container as env variable and is not overwritten/removed somewhere/

Yes, I was running this in a local Kubernetes cluster. I saw other trace events that contained the log group fdb-kubernetes-operator, which is what is set for FDB_NETWORK_OPTION_TRACE_LOG_GROUP in the docker image, and inside the running container.

brownleej avatar Jan 20 '22 17:01 brownleej

I'll try to verify that again and see if there are any logs that have the wrong LogGroup. If the LogGroup is set correctly in all logs I would go ahead and close the issue.

@johscheuer @brownleej Does it make sense to further amend this so that LogGroup contains also the cluster name?

Do you mean to have the same LogGroup as the cluster or have a special LogGroup that appends the cluster name? I think the first case, using the same LogGroup as the cluster would be a good approach and allows to query the log system with the machine (operator) and the specific log group for the cluster.

johscheuer avatar Jan 21 '22 06:01 johscheuer

I'll try to verify that again and see if there are any logs that have the wrong LogGroup. If the LogGroup is set correctly in all logs I would go ahead and close the issue.

@johscheuer @brownleej Does it make sense to further amend this so that LogGroup contains also the cluster name?

Do you mean to have the same LogGroup as the cluster or have a special LogGroup that appends the cluster name? I think the first case, using the same LogGroup as the cluster would be a good approach and allows to query the log system with the machine (operator) and the specific log group for the cluster.

Either approach is fine for me. I have a slight preference for the later approach to help narrowing down the searches e.g. LogGroup="fdb-kubernetes-operator-$CLUSTER_LOGGROUP" or similar.

ltsampros avatar Jan 21 '22 12:01 ltsampros

Once https://github.com/apple/foundationdb/pull/6311 merged we can add the feature for new releases.

johscheuer avatar Jan 31 '22 16:01 johscheuer

This has been fixed in https://github.com/apple/foundationdb/pull/6320 and newer versions of fdbcli (including this fix) will work as expected.

johscheuer avatar Apr 21 '23 07:04 johscheuer