osbuild-composer
osbuild-composer copied to clipboard
main: journald support with external correlation
This patch adds native journald support through logrus which enables to work with structured logs. It uses a dependency that is already in the project (coreos/systemd-go
) which is a pure Go implementation. New log type is introduced named "journal" and it is now the default with fallback to "text" if journald is not present on the system.
Structured logging enabled me to work with logrus fields that can help with correlating log records on standalone or hosted deployments. Already existing "operation_id" field (random string) is now fully propagated via echo context logger and new logrus hook extracts it from Go standard context.
A new field called "external_id" is extracted from HTTP header X-External-Id
when present. The idea is to set this to X-Edge-Request-Id on hosted image builder so logs can be easily correlated across both services.
Additionally, each cloud API request is now logged using the built-in echo logging middleware which creates entries like Processed request GET /path
. Together with journald this can be tested end-to-end:
# curl -H "X-External-Id: XXXX" --unix-socket /run/cloudapi/api.socket -XGET http://localhost/api/image-builder-composer/v2/openapi
This will create the following entries:
# journalctl -e -o json-pretty --output-fields=OPERATION_ID,MESSAGE,EXTERNAL_ID
{
"OPERATION_ID" : "2eJwe1mdLa9s2rB3PVZbVAttQi1",
"EXTERNAL_ID" : "XXXX",
"MESSAGE" : "Returning OpenAPI document"
}
{
"MESSAGE" : "Processed request GET /api/image-builder-composer/v2/openapi",
"EXTERNAL_ID" : "XXXX",
"OPERATION_ID" : "2eJwe1mdLa9s2rB3PVZbVAttQi1",
}
Users can easily dig logs per request or operation:
# journalctl -u osbuild-composer OPERATION_ID=2eJwe1mdLa9s2rB3PVZbVAttQi1 -o cat
Returning OpenAPI document
Processed request GET /api/image-builder-composer/v2/openapi
TODO
- [ ] adequate testing for the new functionality or fixed issue
- [ ] adequate documentation informing people about the change such as
- [ ] submit a PR for the READMEs listed here
- [ ] submit a PR for the osbuild.org website repository if this PR changed any behavior not covered by the automatically updated READMEs
Moved all comments into the codebase, broken into three commits:
- Documentation and gitignore update
- Adding journald native support
- Propagating operation and external id through context to logs (this one is hard to break up into two as I had to do updates to existing operation id in order to efficiently implement external id)
Damn that was a misclick, amended a rebase.
+ sudo composer-cli blueprints depsolve ostree
ERROR: Depsolve Error: Get "http://localhost/api/v1/blueprints/depsolve/ostree": EOF
and then an AWS test which I don’t understand why fails
and then an AWS test which I don’t understand why fails
iot-bootable-container
failed with
dial tcp: lookup cdn03.quay.io: no such host
which looks like a plain old network hiccup. Annoying.
@ezr-ondrej please do not rebase, @achilleas-k plans to only restart failed tests. is that right? thanks.
@ezr-ondrej please do not rebase, @achilleas-k plans to only restart failed tests. is that right? thanks.
It's a tricky decision. It's good to rebase when there's a lot of changes in main and there's a chance the rebase might break things, but OTOH the CI runtimes are long and annoying.
A merge queue would be great but that doesn't work with flaky testing.
Yeah, sorry! I've got new buttons in Gitlab now, I'll push those instead xD
Only schulzbot is red, seems only 8.4 is red, likely not caused by this PR.
likely not caused by this PR.
Yeah, it's a timout, very unlikely to be caused by this PR, but it means the tests on 8.4 didn't run on this PR :(
There's a known issue with that test but I thought we fixed it already.
Rebased.
The main pipeline is green!! Woooot!
Well bunch of tests failed, someone help me to understand if these are relevant. These logs are hard to read.
Since all the edge tests are failing but the main gitlab CI pipeline succeeded, it might something with the edge tests themselves or something in the PR changed something the tests rely on.
I'm seeing at least one instance of the issue described in #4094
Oh, we're hitting the old:
error: Postprocessing and committing: Finalizing rootfs: Hardlinking rpmdb to base location: Hardlinking /usr/share/rpm to /usr/lib/sysimage/rpm-ostree-base-db: Analyzing /usr/share/rpm/ content: File exists (os error 17)\
Traceback (most recent call last):\
File \\\"/run/osbuild/bin/org.osbuild.ostree.commit\\\", line 68, in <module>\
r = main(args[\\\"inputs\\\"],\
File \\\"/run/osbuild/bin/org.osbuild.ostree.commit\\\", line 55, in main\
subprocess.run(argv,\
File \\\"/usr/lib64/python3.9/subprocess.py\\\", line 528, in run\
raise CalledProcessError(retcode, process.args,\
subprocess.CalledProcessError: Command '['rpm-ostree', 'compose', 'commit', '--repo=/run/osbuild/tree/repo', '--add-metadata-string=version=9-stream', '--add-metadata-string=rpmostree.inputhash=1e1524b04aaad2bbdc26c3d2a1d43e1bf5c95ae327bf399af3815389b277f64d', '--write-composejson-to=/run/osbuild/tree/compose.json', '/tmp/tmp1f6ir9sg.json', '/run/osbuild/tree/tmp22cqxixc']' returned non-zero exit status 1.\
\", \"error\": null}}}]" jobId=b11a4709-a108-40d8-8f81-69a07e79ac94
Commit 6f69cacdc5963aa772e62a37c00fa27157a5b7f3 updated the sources which introduces a version of rpm-ostree that causes an issue we're aware of. The issue has been fixed but we've been waiting for it to reach Fedora 39 and CS9 repos before updating sources. But since edge tests aren't required for merging and we needed newer sources for other reasons, I suppose all our edge tests are failing on F39 and CS9 now.
This PR is stale because it has been open 30 days with no activity. Remove "Stale" label or comment or this will be closed in 7 days.
This PR is stale because it has been open 30 days with no activity. Remove "Stale" label or comment or this will be closed in 7 days.
@achilleas-k WDYT about this one? it'd be great to get it in :)
@achilleas-k WDYT about this one? it'd be great to get it in :)
Yes, we should get this merged. It seems it got caught in CI flakiness purgatory. Let's do a rebase and full run to get it in.
Just rpm-build:centos-stream-10-ppc64le
failed
Thank you @achilleas-k and @lzap ! :orange_heart: