opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

journald - Consider parsing more known fields from logs

Open djaglowski opened this issue 3 years ago • 8 comments

...

I'm looking at the journald input, which currently dumps everything into the body. The only thing it pulls out is the timestamp.

Ideally (IMHO), the body would be a string containing MESSAGE and all the other fields stored as attributes. Additionally, the priority would be converted to an OpenTelemetry severity. Given journald has a well defined list of fields [1], many of them could be converted to semantic conventions such as process.pid.

So, the question is whether things like the journald input should convert messages to a more native opentelemetry format? If not, should there be an option, operator or processor that could do it all in one go?

(I can create a PR to change the journald input if there is agreement)

  1. https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html

Originally posted by @gregoryfranklin in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/3129#issuecomment-1017782954

djaglowski avatar Jan 20 '22 22:01 djaglowski

I would like to see journald logs to be parsed right OOTB. I don't know why it shouldn't be done right at the source without any additional operator or processor. Maybe we can provide an option to keep the raw message instead of parsing if user wants that?

dmitryax avatar Jan 21 '22 00:01 dmitryax

I support that the operator ought to do further parsing as well. We already do this in syslog_input to an extent. Some users may prefer the collector to do less parsing, but unless someone is asking for it, I think it's reasonable to add "don't parse" controls later as necessary.

Mapping from priority to severity, the body being only the MESSAGE string, and everything else going to attributes makes sense to me. I also agree with applying semantic conventions as they currently apply. In the long term, I assume we'll have semantic conventions for most/all the fields, and I would personally like to see this operator apply them wherever possible.

What's less clear to me is whether we should work ahead of the semantic conventions. Given that the library is not yet stable and that development can easily outpace semantic debate, I think it's probably reasonable to make a best effort at appropriately naming and then propose the attributes to the spec repo.

What thoughts do others have on all this?

djaglowski avatar Jan 21 '22 04:01 djaglowski

I fully agree on the first two paragraphs.

What's less clear to me is whether we should work ahead of the semantic conventions. Given that the library is not yet stable and that development can easily outpace semantic debate, I think it's probably reasonable to make a best effort at appropriately naming and then propose the attributes to the spec repo.

I see your point, but from the other hand we can cause additional breaking changes to the attributes that the library emits if our best effort doesn't survive the semantic debate. I would go with the semantic conventions first, or in parallel, to make sure that once the change to the library is released at least attribute names are not going to change.

dmitryax avatar Jan 21 '22 05:01 dmitryax

I'll start with a PR that moves all the fields to attributes instead of body and sets the severity.

So, it will change from this

"entry": {
  "timestamp": "2020-04-16T11:05:49.516168-04:00",
  "body": {
    "CODE_FILE": "../src/core/unit.c",
    "CODE_FUNC": "unit_log_success",
    "CODE_LINE": "5487",
    "MESSAGE": "var-lib-docker-overlay2-bff8130ef3f66eeb81ce2102f1ac34cfa7a10fcbd1b8ae27c6c5a1543f64ddb7-merged.mount: Succeeded.",
    "MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448",
    "PRIORITY": "6",

to this

"entry": {
  "timestamp": "2020-04-16T11:05:49.516168-04:00",
  "body": "var-lib-docker-overlay2-bff8130ef3f66eeb81ce2102f1ac34cfa7a10fcbd1b8ae27c6c5a1543f64ddb7-merged.mount: Succeeded.",
  "severity": 9,
  "severity_text", "INFO",
  "attributes": {
    "CODE_FILE": "../src/core/unit.c",
    "CODE_FUNC": "unit_log_success",
    "CODE_LINE": "5487",
    "MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448",

Some of the fields are also clearly resource attributes rather than log attributes, but I think that is linked to the semantic conventions discussion so will leave for a later PR.

I agree it makes sense to wait for the discussion on semantic conventions to move further along before applying these to the library. In the spec there is an example mapping for syslog (and many other common logging formats) - it would be nice to see the same for journald.

gregoryfranklin avatar Jan 21 '22 11:01 gregoryfranklin

Has there been any movement on this? If not, I would be happy to jump in. I have been working on integrating Journald into a few projects and it has been frustrating trying to move all fields over to attributes.

jsirianni avatar Aug 04 '22 13:08 jsirianni

I don't see any reason why this shouldn't move forward. Putting this info in attributes is clearly an improvement in terms of respecting the data model.

djaglowski avatar Aug 04 '22 13:08 djaglowski

I had a go at this a while back but the decision was to open an issue against the specification to get the missing semantic conventions defined. You'll see in the PR (https://github.com/open-telemetry/opentelemetry-log-collection/pull/353) that I identified all the well-defined fields from journald and which of them could be mapped to existing semantic conventions.

But, I've not had time to follow up on it in terms of opening an issue against the spec or anything.

gregoryfranklin avatar Aug 04 '22 13:08 gregoryfranklin

There is an ongoing discussion about whether and how semantic conventions should be defined on a per component basis, so it is not even fully settled that these attributes will ever be codified. At this point, I don't think we should be waiting on semantic conventions. Many other components are moving forward with best effort attribute naming. This is an alpha components anyways, so names are not irreversible.

What we know for sure at this point is that this integration is not adhering to the data model's direction that first-party libraries should emit logs with the body as a string. We should correct this behavior.

Since adhering to the data model will require us to set attributes, we could either 1) follow the style and spirit of the semantic conventions, making a best effort to establish reasonable attribute names, or 2) use the raw values from journald (eg. _MACHINE_ID). The semantic conventions provide detailed guidance on naming attributes, so I believe we should apply these as best we can to this domain.

djaglowski avatar Aug 04 '22 14:08 djaglowski

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] avatar Nov 16 '22 03:11 github-actions[bot]

This is still relevant, but should wait until a final decision on the ECS OTEP.

djaglowski avatar Nov 16 '22 14:11 djaglowski

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] avatar Mar 16 '23 03:03 github-actions[bot]

This is still relevant, but should wait until a final decision on the ECS OTEP.

Since ECS OTEP has been merged, does that change anything regarding this journald issue? I can't see anything about the ECS OTEP that would have an impact.

cwegener avatar Nov 21 '23 21:11 cwegener

I am also interested in this. Should I revive the PR that @gregoryfranklin made?

fionera avatar Nov 27 '23 06:11 fionera

Looks like I mistakenly believed ECS defined semantics for journald which we would use. I think we can work out our own semantics but should keep the component maturity level at alpha until we've defined semantic conventions for any attributes which we use.

djaglowski avatar Nov 27 '23 15:11 djaglowski

Was there a Google Doc already to work on Journald semantics?

cwegener avatar Nov 27 '23 20:11 cwegener

Was there a Google Doc already to work on Journald semantics?

I'm not aware of one.

djaglowski avatar Nov 27 '23 21:11 djaglowski

Hi, I'm interested in this too :) @djaglowski @cwegener is there any work being done that it's not tracked? Anything that can be done to help out?

FeltrinN avatar Jan 12 '24 00:01 FeltrinN

Hi, I'm interested in this too :) @djaglowski @cwegener is there any work being done that it's not tracked? Anything that can be done to help out?

I haven't done any work on this. I think the first step would be to review the fields in Journald and figure out how many of them should/can be mapped into OTel Log attributes. And if there is any impact on existing or not yet existing semantic conventions.

The available Journald fields are: https://www.freedesktop.org/software/systemd/man/latest/systemd.journal-fields.html

cwegener avatar Jan 12 '24 02:01 cwegener

Sounds good. I'll try putting together a table unless somebody objects :) Once I've got a reasonable draft I'll open a documentation PR...

FeltrinN avatar Jan 16 '24 00:01 FeltrinN

Sounds good. I'll try putting together a table unless somebody objects :) Once I've got a reasonable draft I'll open a documentation PR...

The old PR for this already had the first cut of mapping to semconv done, which might help. https://github.com/open-telemetry/opentelemetry-log-collection/pull/353/files#diff-913aee336b63830976ccc7a02800e0837017f628830bce49f88dc4aeabffdebe

I can't easily tell how much code of that old PR would actually still apply today though (e.g. I can't tell if the way how the severity mapping was done in the old PR would still how it should be done today)

cwegener avatar Jan 16 '24 00:01 cwegener

(e.g. I can't tell if the way how the severity mapping was done in the old PR would still how it should be done today)

I've done some very early stages work, but I think it's clear already that the severity mapping from the earlier PR is valid (and quite stable, as those mappings are already in an appendix to the semantic conventions). Therefore, if anyone feels like it, I think it should be okay to split the old PR and merge the severity mappings already, as getting the fields in order will probably take a bit longer...

FeltrinN avatar Jan 16 '24 23:01 FeltrinN

On this topic, is there an easy-ish way of doing this with transforms/existing tooling today?

yaleman avatar Mar 05 '24 01:03 yaleman