opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

How to classify browser and mobile telemetry

Open martinkuba opened this issue 2 years ago • 7 comments

What are you trying to achieve?

The client-side instrumentation SIG is working on defining telemetry for client-side applications, and we believe that we need a way to classify this telemetry somehow. The reasons for this are

  • backends processing client-side telemetry might want to perform specific post-processing or analysis of the data
  • vendors might want to present a different UI experience for browser and mobile devices than for backend services

We would like to specify what attributes an SDK MUST include on a resource in order for the telemetry to be interpreted as browser or mobile. We could use guidance (or further discussion) on the approach that makes sense in a wider context of the project.

Possible options for classifying browser telemetry

  1. presence of browser attributes on resource (proposed here)

This aligns with how semantic conventions have been used so far and is also recommended in this PR

Counter arguments:

  • it may not be possible to collect browser attributes in every environment
  • browser attributes would primarily collect information about the user agent, which should be optional because it is a supplemental piece of information (and the telemetry is still useful without it)
  • instead we would be making it required, by saying - if you want this to be treated as browser telemetry, then you MUST capture user agent
  1. value of the process.runtime.name attribute

This is already defined in the specs for JavaScript runtimes here

Counter arguments:

  • the accompanying process.runtime.version attribute does not make much sense for browsers - the current example in the spec shows the user-agent string, which includes more information than a version
  1. schema

I am not sure if this could fall under the intended purpose of schemas. The idea is to have a schema that is unique to client-side telemetry, and the classification would be done based on the value of the schema field.

Possible options for classifying mobile telemetry

  1. presence of device attributes

It seems that these attributes were originally intended for mobile devices.

Counter arguments:

  • the term device is generic enough that it could be used for IoT devices or even infra
  1. value of the os.name attribute

The examples in the spec include Android and iOS.

Counter arguments:

  • consumers of the data will need to know a full list of OS names that apply to mobile devices

martinkuba avatar Apr 01 '22 23:04 martinkuba

Thanks for summarizing these options, @martinkuba!

Browser

Ad 1. If the proposed browser.platform was kept despite duplicating os.name, its presence would be sufficient without having to capture a user agent.

Ad 2. Having the existing process.runtime.name=browser attribute looks like a good approach to me. If the defined value for process.runtime.version does not make much sense, this could be changed independent from the classification issue discussed here.

Ad 3. I also don't think that introducing a separate schema just for client-side telemetry would make sense and open a lot of new questions. Would this be entirely separate? Would this be an extension to the "generic" schema so you would still be able to use the attributes defined here? How would versioning be handled? Also this would make it necessary to have a "topmost"/application-level tracer that is guaranteed to use this schema while other libraries with built-in instrumentation or separate instrumentation libraries might still use the generic schema. Furthermore, it would put additional burden on telemetry consumers to potentially develop/maintain support for two "worlds" of such data.

Mobile

Ad 1. How about adding an open enum device.kind with a value mobile/handheld or the like? (distinguishing phone and tablet might be difficult and likely not even necessary/insightful)

Ad 2. For this to be feasible an enum like we have for os.type would be necessary.

arminru avatar Apr 05 '22 16:04 arminru

device.kind makes sense. When I've implemented this previously, we've had enumerate values like:

        - mobile
        - wearable
        - desktop
        - streamer

Where "mobile" would include both phones and tablets, due to the difficulty of differentiating form factors on Android, and desktop including both PCs and laptops. Naming subject to debate.

ladd avatar Apr 05 '22 18:04 ladd

@arminru

If the proposed browser.platform was kept despite duplicating os.name, its presence would be sufficient without having to capture a user agent.

In some browsers (older versions of Chromium, Firefox, Safari), it is not possible to get the platform value alone, only the full user-agent string.

Having the existing process.runtime.name=browser attribute looks like a good approach to me. If the defined value for process.runtime.version does not make much sense, this could be changed independent from the classification issue discussed here.

I am in favor of this approach, as it makes it straight-forward. I think we still need browser attributes in addition (see https://github.com/open-telemetry/opentelemetry-specification/pull/2353). I think it would make sense to capture the version (or user agent string) there instead.

With that said, I would like to know if others have any objections to using process.runtime.name = "browser".

I also don't think that introducing a separate schema just for client-side telemetry would make sense and open a lot of new questions. Would this be entirely separate? Would this be an extension to the "generic" schema so you would still be able to use the attributes defined here? How would versioning be handled? Also this would make it necessary to have a "topmost"/application-level tracer that is guaranteed to use this schema while other libraries with built-in instrumentation or separate instrumentation libraries might still use the generic schema. Furthermore, it would put additional burden on telemetry consumers to potentially develop/maintain support for two "worlds" of such data.

This was an idea mentioned by @jmacd. I would need more guidance from others whether this makes sense to pursue.

martinkuba avatar Apr 13 '22 00:04 martinkuba

@martinkuba

In some browsers (older versions of Chromium, Firefox, Safari), it is not possible to get the platform value alone, only the full user-agent string.

So browser.platform would also be left empty in this case since the sentiment is to not impose the requirement of user-agent parsing on instrumentation, right?

I am in favor of this approach, as it makes it straight-forward. I think we still need browser attributes in addition (see #2353). I think it would make sense to capture the version (or user agent string) there instead.

For browser.user_agent this certainly makes sense, yes 👍 In ECS there is a top-level user_agent attribute (reference) but I think having a browser namespace on top-level makes sense as we might add more browser-related attributes in future.

arminru avatar Apr 15 '22 15:04 arminru

So browser.platform would also be left empty in this case since the sentiment is to not impose the requirement of user-agent parsing on instrumentation, right?

Yes, I don't think we should put the burden of parsing the user-agent string on the client instrumentation.

martinkuba avatar Apr 15 '22 21:04 martinkuba

Before we close this, we should document the outcome in semantic conventions.

Other options:

  • entity signal
  • ongoing conversation about service.name
  • telemetry.sdk.language value

martinkuba avatar Apr 23 '24 16:04 martinkuba