opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Clarify the mapping between Google Cloud Logging and OpenTelemetry

Open alexvanboxel opened this issue 1 year ago • 13 comments

What are you trying to achieve?

With this draft collector PR the PubsubReceiver will be able to handle the native translation of Google Cloud LogEntry (this is a feature provided by Google Cloud Logging: see View logs routed to Pub/Sub). We need more details on all the mapping from the LogEntry to the OpenTelemetry fields and attributes.

Additional context.

See this issue on the collector: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/23184

alexvanboxel avatar Nov 19 '23 18:11 alexvanboxel

I've opened a draft PR https://github.com/open-telemetry/opentelemetry-specification/pull/3775 on the proposed clarifications. It would be great to have some involvement for Google contributors.

alexvanboxel avatar Nov 19 '23 18:11 alexvanboxel

This makes sense, but the document modified in #3775 isn't normative. Perhaps this type of language belongs in a new document in the compatibility directory. Additionally, the new attributes defined should probably be defined and added to the semantic-conventions.

jack-berg avatar Nov 29 '23 15:11 jack-berg

Hey @alexvanboxel,

I really like your suggested PR in that it adds the read side of the OLTP<-> Google Cloud Logging Mapping.

We've been having a lot of internal conversations around this mapping and have a (similar) proposed spec. @jack-berg and I discussed where this documentation might belong (see his comment above). I think we have a few options:

  1. We can create a new compatibility section that outlines this mapping.
  2. We could put this document in the collector near the receiver to denote what it does to GCP data.

Which would you prefer?

jsuereth avatar Nov 29 '23 21:11 jsuereth

I think a new compatibility section is a great idea. Does this mean that all the sections in the datamodel-appendix would move to the compatibility section?

alexvanboxel avatar Nov 30 '23 06:11 alexvanboxel

Additionally, the new attributes defined should probably be defined and added to the semantic-conventions.

I don't think I've added anything new. I've used the semcon that was available (except the gcp.*). I want to go further and define the mapping for the HTTP structure, but although HTTP is described as stable, it is not clearly defined for logs yet.

alexvanboxel avatar Nov 30 '23 06:11 alexvanboxel

@jsuereth I've moved it in the branch to the compatibility section

alexvanboxel avatar Dec 11 '23 07:12 alexvanboxel

@jack-berg and @jsuereth doesn't it make sense to have a compatibility section in the logs directory itself and that all model-appendix move (one by one) to that directory?

alexvanboxel avatar Dec 11 '23 07:12 alexvanboxel

It was silly of me to comment on the already-closed PR, so pasting my comments here.

on sourceLocation:

looking at docs for LogEntrySourceLocation I this ought to be mapped to the equivalent semconvcode.* attributes as follows:

  • sourceLocation.file => code.filepath
  • sourceLocation.line => code.lineno

Less sure of what to do with sourceLocation.function, which is described as

Human-readable name of the function or method being invoked, with optional context such as the class or package name. This information may be used in contexts such as the logs viewer, where a file and line number are less meaningful. The format can vary by language. For example: qual.if.ied.Class.method (Java), dir/package.func (Go), function (Python).

it ought to go into the code.namespace and code.function attributes. if you happen to have a sample of data across different languages for which gcp has a logging integration, perhaps the right thing to do can be figured out based on that? failing that, a reasonable best effort might be (python style split / indexing):

  • code.function gets sourceLocation.function.split(".")[-1]
  • code.namespace gets ".".join(sourceLocation.function.split(".")[:-1])



on httpRequest:

could be worth attempting to map these to http semconv? not sure why I didn't do it initially.

  • requestMethod => http.request.method
  • requestUrl => url.full
  • requestSize => see [0] below
  • status => http.response.status_code
  • responseSize => see [0] below
  • userAgent => user_agent.original
  • remoteIp => client.address
  • serverIp => server.address
  • referer => http.request.header.referer[0]
  • protocol => net.protocol.name (lower case), net.protocol.version

The remaining ones can be mapped to gcp.*:

  • latency
  • cacheLookup
  • cacheHit
  • cacheValidatedWithOriginServer
  • cacheFillBytes

[0] I had previously opened https://github.com/open-telemetry/semantic-conventions/pull/84 to add "whole-request" size attributes, ie headers + body. I will rebase / update the PR this week to see if it gets any traction.

kamalmarhubi avatar Jan 26 '24 21:01 kamalmarhubi

@open-telemetry/collector-contrib-maintainer can this get moved to collector-contrib repo?

austinlparker avatar Apr 30 '24 20:04 austinlparker

I can add the mapping in the collector README, but I feel the documentation should be either in spec/compatibility or either semconv/compatibiltity.

alexvanboxel avatar May 04 '24 10:05 alexvanboxel

I guess we can do the same we did on https://github.com/open-telemetry/opentelemetry-specification/pull/3964#pullrequestreview-1960788192. I still think collector-contrib is not the right place for this long term, but for now it seems reasonable to me

mx-psi avatar May 06 '24 09:05 mx-psi

I'm actively working on improving the stability of the gcp pubsub log -> otlp log; I will include the documentation in the same commit. But a side note: It only makes sense to add the conventions I use in the receiver; I hope Google (or any other provider) will document their mapping.

alexvanboxel avatar May 06 '24 10:05 alexvanboxel