uyuni icon indicating copy to clipboard operation
uyuni copied to clipboard

WIP: containerized broker logs

Open avshiliaev opened this issue 2 years ago • 5 comments

What does this PR change?

This PR should make it easier to debug the containerized broker by redirecting logs into stdout/stderr inside a container. Optionally we want also do discuss the possibility of implementing a more robust logs management solution for our container suite.

GUI diff

No difference.

  • [x] DONE

Documentation

  • No documentation needed: only internal and user invisible changes

  • [x] DONE

Test coverage

No tests: already covered

  • [X] DONE

Links

Fixes https://github.com/SUSE/spacewalk/issues/18428

  • [x] DONE

Changelogs

Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository

If you don't need a changelog check, please mark this checkbox:

  • [ ] No changelog needed

If you uncheck the checkbox after the PR is created, you will need to re-run changelog_test (see below)

Re-run a test

If you need to re-run a test, please mark the related checkbox, it will be unchecked automatically once it has re-run:

  • [ ] Re-run test "changelog_test"
  • [ ] Re-run test "backend_unittests_pgsql"
  • [ ] Re-run test "java_pgsql_tests"
  • [ ] Re-run test "schema_migration_test_pgsql"
  • [ ] Re-run test "susemanager_unittests"
  • [ ] Re-run test "javascript_lint"
  • [ ] Re-run test "spacecmd_unittests"

avshiliaev avatar Jul 28 '22 14:07 avshiliaev

@cbosdo @aaannz I've changed the configuration script so that the logs are getting redirected and the value for the log-level is set from the config.yaml. As for the formatting of the logs, there is an issue though and I would like your input on that. Also please correct me if I'm wrong.

So basically rather than tweaking the logging format in the python/spacewalk/common/rhnLog.py just for our case, maybe we could go for a more robust solution here and add some log management into our setup. I think sooner or later we'll stumble into the good old observability problem anyway with our containerized proxy. Just the last problem with autoinstallation showed the system is a bit blackboxed now.

First of all, as you might expect, the container logs are a bit tricky for a couple of reasons: If we write logs as always into a *.log file, it's gone with the container obviously. If we rely on the stdout, the logs get a bit fragmented. Containers emit logs to the stdout and stderr output streams then the logs are stored on the container host in JSON files by default. At least for Docker this is true. I guess this is the same for the other platforms.

  • They're stored per container and the deamon logs are stored separately. I.e. we can have hard time searching for an issue. I think it's a bit easier in case of k8s/k3s, since you can get deployment based logs AFAIK. But then again, the're stored on that host, where the kubelet is running. Not optimal for a cloud scenario I think.
  • Each container runtime has its own rotation policy which has to be tuned manually. Again not really optimal. We use docker and podman, but k8s/k3s can also work with CRI-O and containerd.

Here's a basic overview of the logging architecture on kubernetes.

We could look towards a solution like Elastic Stack. At least this is something quite common out there and it should make the logs more like the normal uyuni logs.

What do you think?

avshiliaev avatar Jul 28 '22 15:07 avshiliaev

Here's a basic overview of the logging architecture on kubernetes.

We could look towards a solution like Elastic Stack. At least this is something quite common out there and it should make the logs more like the normal uyuni logs.

What do you think?

Are you thinking about employing ELK in our testing and internal usage, or shipping it for our users? I am all for having it internally and maybe produce some whitepapers/tutorials, but exactly because of the link you posted above I think this would be wasted time on our side. We should be good container citizen and print to stdout/stderr and users are then responsible to have another service handling logs.

Similar can be done wrt podman/docker and e.g. systemd-journald with systemd-journal-remote package.

I think all we should do is to differentiate, via prefix I suppose, logs from different systems though - http, wsgi.

aaannz avatar Jul 28 '22 16:07 aaannz

I think all we should do is to differentiate, via prefix I suppose, logs from different systems though - http, wsgi.

Totally agree, but I would have prefixes like http, broker, redirect since there are two distinct WSGI scripts. Shipping a log stack is not our job, we have to play nicely with all of them since the users may have ELK or another one

cbosdo avatar Jul 29 '22 06:07 cbosdo

@aaannz

It seems there're a couple of things we could do to label our log entries. The question is on which level in a log entry do we want our label to be? I assume we want our logs to be like [broker] <timestamp> ... <log message> ... and so we need to modify the way Apache formats logs. First of all, since all logs are getting passed through Apache, we want to modify the corresponding module in the configuration called mod_log_config. There's the LogFormat directive and the list of values it can access via string interpolation.

  • I could't find anything special there to differentiate requests thb. Theoretically, we set options on a request like this, can we retrieve it? Cause the value is exactly broker and broker-redirect. The problem is, on ErrorLogFormat there's different set of values we can get 🤷
  • Another option would be to use env variables. Both LogFormat and ErrorLogFormat can access them like
LogFormat "%{VARIABLE}e" 
ErrorLogFormat "%{VARIABLE}e %M"

The thing is, we can get normal env variables this way but I'm not quite sure it works dynamically. It works perfectly fine when they set in advance, but if I for example try to set it here like: os.environ["VARIABLE"] = "broker", it doesn't work. Maybe it's just because of the order/lifecycle of wsgi components. However, when using middleware which is obviously being called on request-basis, it doesn't work either.

import os
from wsgi import wsgiHandler


def application(environ, start_response):
    ...

class LoggingMiddleware:

    def __init__(self, application):
        self.__application = application

    def __call__(self, environ, start_response):

        def _start_response(status, headers, *args):
            os.environ["VARIABLE"] = "broker"
            return start_response(status, headers, *args)

        return self.__application(environ, _start_response)

application = LoggingMiddleware(application)
  • We could use the SetEnvIf directive, but then again, what should be the condition?
  • We can define separate LogFormat by either path matching or port number (VirtualHost). It's probably reliable, but I don't know. You cannot use LogFormat under If directive though. This is stupid cause there's a whole lot of expressions available.
  • There're the dedicated config files under /etc/apache2/conf.d/* and I tried to use SetEnv in there, like for proxy, for tftp-sync and stuff, but they just override each other.

So it's kind of walking in circles, but it should be trivial! 😅

avshiliaev avatar Aug 05 '22 16:08 avshiliaev

* Another option would be to use env variables. Both LogFormat and ErrorLogFormat can access them like
LogFormat "%{VARIABLE}e" 
ErrorLogFormat "%{VARIABLE}e %M"

The thing is, we can get normal env variables this way but I'm not quite sure it works dynamically. It works perfectly fine when they set in advance, but if I for example try to set it here like: os.environ["VARIABLE"] = "broker", it doesn't work. Maybe it's just because of the order/lifecycle of wsgi components. However, when using middleware which is obviously being called on request-basis, it doesn't work either.

I think this is exactly what we should do. Lets add these env vars to Dockerfile. If our users are not happy with our default format, they can then override them by passing custom options to podman or helm (here we will probably need to do some enablement).

aaannz avatar Aug 05 '22 22:08 aaannz

* Another option would be to use env variables. Both LogFormat and ErrorLogFormat can access them like
LogFormat "%{VARIABLE}e" 
ErrorLogFormat "%{VARIABLE}e %M"

The thing is, we can get normal env variables this way but I'm not quite sure it works dynamically. It works perfectly fine when they set in advance, but if I for example try to set it here like: os.environ["VARIABLE"] = "broker", it doesn't work. Maybe it's just because of the order/lifecycle of wsgi components. However, when using middleware which is obviously being called on request-basis, it doesn't work either.

I think this is exactly what we should do. Lets add these env vars to Dockerfile. If our users are not happy with our default format, they can then override them by passing custom options to podman or helm (here we will probably need to do some enablement).

I'm a little late to the party, but those two WSGI processes are run by one single apache instance... would you be able to use two different log formats for each of them?

The only solution I would see would be to hack rhnLog module to accept a variable containing the log prefix... This could then be configured from the env variable as you mention. However it's dangerous to let the users modify this: it will make it harder for our tools to spot issues in the logs if L3 arrives

cbosdo avatar Aug 17 '22 08:08 cbosdo

@cbosdo thanks for taking a look, we've been also waiting for you :smile:

First of all, I don't know if we want to set anything through the dockerfile. In Apache conf they treat env variables of the system together with the variables the runtime initializes, so they're all just env variables. It's a bit confusing. So using SetEnv directive is nothing else than just assigning an internal variable.

I agree it's tricky to differentiate between scripts that process a request. I think the only viable option here would be to use SetEnv directive under the Directory one. More on Directory directive here. So each time a script under certain dir gets called, our var gets rewritten to a certain value hard coded in the config itself and then passed to the logger. I've tested it already.

avshiliaev avatar Aug 17 '22 09:08 avshiliaev

You are right, I don't know what I was thinking before. Probably just about log format instead of differentiating between those two processes. So forget about env vars.

Since we own the logging, can we just modify rhnLog so that initLOG would optionally take e.g. component (by default empty string) and just hardcode them for wsgi_broker and wsgi_redirect? I am for having the wsgi_ prefixes to differentiate between salt broker (if those log files ever get mixed).

We should IMO do this directly in rhnLog so this will affect both regular and containerized proxy.

aaannz avatar Aug 17 '22 11:08 aaannz

We should IMO do this directly in rhnLog so this will affect both regular and containerized proxy.

That would make any log analyzing tooling simpler if we have the same format in both regular and container cases indeed.

cbosdo avatar Aug 17 '22 12:08 cbosdo

Ok, we could definitely modify the rhnLog together with the apache httpd conf. The last one is needed only if we want to mark the default-server/proxy-html logs that do not use the rhnLog component.

avshiliaev avatar Aug 18 '22 14:08 avshiliaev

@cbosdo @aaannz

We have the following log format now: [%{HANDLER_TYPE}e] [%t] [%l] [pid %P] %F: %E: [client %a] %M".

  1. With this commit we can now handle the leftmost argument in the log format above, i.e HANDLER_TYPE. It's done of the Apache httpd level to distinguish between our broker, html, docs components and anything else on the server.
  2. This commit is to control the nested message coming from the actual script, i.e what's inside the %M argument in the format above. Turns out the rhnLog concatenates multiple arguments to create a message. So nothing to change there. Just make the caller pass in another arg.

So please take a look. I've left the changelog untouched cause maybe we can add something else here.

avshiliaev avatar Aug 25 '22 15:08 avshiliaev

@aaannz thanks a lot for your feedback :+1: I think we got everything now unless you spot something else.

avshiliaev avatar Aug 29 '22 10:08 avshiliaev

@cbosdo thanks for a thorough review :+1:

avshiliaev avatar Aug 29 '22 13:08 avshiliaev