glusterd2 icon indicating copy to clipboard operation
glusterd2 copied to clipboard

CI: Dump logs on console output

Open prashanthpai opened this issue 6 years ago • 6 comments

The CentOS CI tests has to preserve logs and other artefacts to help debugging.

prashanthpai avatar Apr 17 '18 03:04 prashanthpai

@phlogistonjohn Would we be able to complete this in a week's time or do you want me to move this to Sprint 2?

atinmu avatar Jul 09 '18 02:07 atinmu

Let's move this one. I don't think it is going to be difficult to implement but there are a number of choices to be made that I think we ought to discuss first.

phlogistonjohn avatar Jul 09 '18 17:07 phlogistonjohn

So now that this is in sprint2 and we have a little breathing room I'd like to discuss how this ought to work.

I see the following entities potentially generating logging data:

  • glusterd2 (N instances, depends on tests)
  • other gluster processes (other gluster daemons that glusterd2 interacts with, brick processes, self heal, etc... number depends on tests)
  • etcd (when using external etcd, typically one process)
  • the test code itself (I think it is reasonable for the test drivers to produce its own logging[1])

For some complex tests that run more than one glusterd2 process and perform enough operations to produce a good amount of logging the output produced could be quite overwhelming.

So I'm not 100% sold on the idea that we can just dump logs to console and call it a day.

Here's my initial proposal, please let me know what you think.

  • Basic logging support will be added to the tests themselves so that tests can log what they're doing with timestamps, etc. By default this logging will not be printed out. Command line switches will control where the test logging goes (actual switches tbd).
  • Helper functions will be added to the utils library to make logging tests less painful [1]
  • A tool to gather up the relevant other log files and any other files that we think we'd need in debugging the tests (in a tarball for example) will be added to the repo
  • The tool will run after the tests complete
  • We tweak the centos ci config to retain the gathered logs as part of the jenkins job

1 - I'd be happy to use the log functions built into testing if they actually printed when they were called instead of buffering to the end of the test & the output was customizable. Personally, I find the testing framework in go pretty simplistic and inflexible. I don't like the idea of having to add More Code (tm) to the framework but I don't see a better way to do it ... suggestions welcome.

phlogistonjohn avatar Jul 11 '18 22:07 phlogistonjohn

So I'm not 100% sold on the idea that we can just dump logs to console and call it a day.

I don't like the idea of having to add More Code (tm) to the framework but I don't see a better way to do it ... suggestions welcome.

I should've been clearer in the issue description. I intended to mean - add link or location of logs (tarballed and compressed) in console output if the tests fail. The proposal is to not add any code to the framework although that's very welcome but not a priority now :) We're looking to debug tests that fail only on the CI machine and not on local runs (one such test is TestBitrot/Replica-volume)

We tweak the centos ci config to retain the gathered logs as part of the jenkins job

This is exactly what we want. The glusterfs CI supports this. An example test run here: https://build.gluster.org/job/centos7-regression/1653/

17:11:38 Cores and build archived in http://builder101.cloud.gluster.org/archived_builds/build-install-centos7-regression-1653.tar.bz2

@kshlm may be able to help you with the CI config changes as he has the necessary credentials while you can may be focus on gathering logs and segregating them. Two approaches I could think of:

  • Make each test have its own workdir - gd2 instances' localstatedirs and etcd dirs would be subdirs of each test's workdir. Create a "base directory" which will contain all workdirs of tests. All of the contents of that "base directory" can be tarballed at the end of test run by extras/centos-ci.sh and made available.
  • Add a deferred function to each test which collects and compresses logs of that test run before wiping it off or... make this part of tearDownCluster. We do not run tests in parallel, so this should be okay.

For some complex tests that run more than one glusterd2 process and perform enough operations to produce a good amount of logging the output produced could be quite overwhelming.

Valid point but such complexity is limited now. The request logging interleaved with timestamps help though. The glusterfs CI infra/scripts inject test/command that is being run directly into the logs on a best effort basis, which is kinda cool and helps a lot. At this point for glusterd2, that's a nice thing to have but not a priority :)

Example CI log dump from glusterfs (decompresses to about 800MB) for reference: https://build.gluster.org/job/centos7-regression/1618/artifact/glusterfs-logs.tgz

prashanthpai avatar Jul 12 '18 05:07 prashanthpai

+1 thanks for confirming what I was hoping for.

The glusterfs CI infra/scripts inject test/command that is being run directly into the logs on a best effort basis, which is kinda cool and helps a lot.

I'm not sure what you mean by this. Do you have an example?

phlogistonjohn avatar Jul 12 '18 22:07 phlogistonjohn

I'm gonna take a stab at this soon. It's hard to debug without logs.

prashanthpai avatar Sep 05 '18 04:09 prashanthpai