opentelemetry-specification
opentelemetry-specification copied to clipboard
Clarify the mapping between Google Cloud Logging and OpenTelemetry
What are you trying to achieve?
With this draft collector PR the PubsubReceiver will be able to handle the native translation of Google Cloud LogEntry (this is a feature provided by Google Cloud Logging: see View logs routed to Pub/Sub). We need more details on all the mapping from the LogEntry to the OpenTelemetry fields and attributes.
Additional context.
See this issue on the collector: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/23184
I've opened a draft PR https://github.com/open-telemetry/opentelemetry-specification/pull/3775 on the proposed clarifications. It would be great to have some involvement for Google contributors.
This makes sense, but the document modified in #3775 isn't normative. Perhaps this type of language belongs in a new document in the compatibility directory. Additionally, the new attributes defined should probably be defined and added to the semantic-conventions.
Hey @alexvanboxel,
I really like your suggested PR in that it adds the read side of the OLTP<-> Google Cloud Logging Mapping.
We've been having a lot of internal conversations around this mapping and have a (similar) proposed spec. @jack-berg and I discussed where this documentation might belong (see his comment above). I think we have a few options:
- We can create a new compatibility section that outlines this mapping.
- We could put this document in the collector near the receiver to denote what it does to GCP data.
Which would you prefer?
I think a new compatibility section is a great idea. Does this mean that all the sections in the datamodel-appendix would move to the compatibility section?
Additionally, the new attributes defined should probably be defined and added to the semantic-conventions.
I don't think I've added anything new. I've used the semcon that was available (except the gcp.*). I want to go further and define the mapping for the HTTP structure, but although HTTP is described as stable, it is not clearly defined for logs yet.
@jsuereth I've moved it in the branch to the compatibility section
@jack-berg and @jsuereth doesn't it make sense to have a compatibility section in the logs directory itself and that all model-appendix move (one by one) to that directory?
It was silly of me to comment on the already-closed PR, so pasting my comments here.
on sourceLocation
:
looking at docs for LogEntrySourceLocation I this ought to be mapped to the equivalent semconv
code.*
attributes as follows:
sourceLocation.file
=>code.filepath
sourceLocation.line
=>code.lineno
Less sure of what to do with
sourceLocation.function
, which is described asHuman-readable name of the function or method being invoked, with optional context such as the class or package name. This information may be used in contexts such as the logs viewer, where a file and line number are less meaningful. The format can vary by language. For example: qual.if.ied.Class.method (Java), dir/package.func (Go), function (Python).
it ought to go into the
code.namespace
andcode.function
attributes. if you happen to have a sample of data across different languages for which gcp has a logging integration, perhaps the right thing to do can be figured out based on that? failing that, a reasonable best effort might be (python style split / indexing):
code.function
getssourceLocation.function.split(".")[-1]
code.namespace
gets".".join(sourceLocation.function.split(".")[:-1])
on httpRequest
:
could be worth attempting to map these to http semconv? not sure why I didn't do it initially.
requestMethod
=>http.request.method
requestUrl
=>url.full
requestSize
=> see [0] belowstatus
=>http.response.status_code
responseSize
=> see [0] belowuserAgent
=>user_agent.original
remoteIp
=>client.address
serverIp
=>server.address
referer
=>http.request.header.referer[0]
protocol
=>net.protocol.name
(lower case),net.protocol.version
The remaining ones can be mapped to
gcp.*
:
latency
cacheLookup
cacheHit
cacheValidatedWithOriginServer
cacheFillBytes
[0] I had previously opened https://github.com/open-telemetry/semantic-conventions/pull/84 to add "whole-request" size attributes, ie headers + body. I will rebase / update the PR this week to see if it gets any traction.
@open-telemetry/collector-contrib-maintainer can this get moved to collector-contrib repo?
I can add the mapping in the collector README, but I feel the documentation should be either in spec/compatibility or either semconv/compatibiltity.
I guess we can do the same we did on https://github.com/open-telemetry/opentelemetry-specification/pull/3964#pullrequestreview-1960788192. I still think collector-contrib is not the right place for this long term, but for now it seems reasonable to me
I'm actively working on improving the stability of the gcp pubsub log
-> otlp log
; I will include the documentation in the same commit. But a side note: It only makes sense to add the conventions I use in the receiver; I hope Google (or any other provider) will document their mapping.