opentelemetry-js icon indicating copy to clipboard operation
opentelemetry-js copied to clipboard

Semconv 1.25

Open dyladan opened this issue 1 year ago • 32 comments

This is a big PR but most of it is autogenerated. Below is a list of changes:

  • Update semconv to 1.25
  • Update semconv generator to 0.24
  • Output to experimental.ts and stable.ts so we can export separately in the future if required
    • experimental attributes and metrics now have @experimental jsdoc tag
  • Change SEMRESATTRS_ and RESATTRS_ to just ATTR_ for attributes
  • Generate constants for metric names with METRIC_ prefix
  • Deprecate all old names. These files will never change again and be removed in 2.0 if we ever release one
  • All names are constants now. Removes requirement for all the weird type stuff (sorry @MSNev I know you spent a lot of time on that)

Notes:

  • Template attributes are still not supported such as http.request.header.<key> for now. It's not clear how we can/should support them and until we make a decision i'd leave them out (they were excluded/didn't exist before)

Questions:

  • ~~For the main export import {} from '@opentelemetry/semantic-conventions' should ALL semconv be exported experimental and stable, or should only the stable be exported and experimental would be imported from @opentelemetry/semantic-conventions/experimental?~~ main export is stable only with backwards compatibility for previous releases.

Example: this is what it would look like to update the utils.ts file in the http instrumentation.

import {
  ATTR_HTTP_ROUTE,
  // These are not in the updated semconv and need to be imported with old names for now
  SEMATTRS_HTTP_CLIENT_IP,
  SEMATTRS_HTTP_HOST,
  SEMATTRS_HTTP_REQUEST_CONTENT_LENGTH_UNCOMPRESSED,
  SEMATTRS_HTTP_RESPONSE_CONTENT_LENGTH_UNCOMPRESSED,
  SEMATTRS_HTTP_SERVER_NAME,
  SEMATTRS_NET_HOST_IP,
  SEMATTRS_NET_PEER_IP,
} from '@opentelemetry/semantic-conventions';
import {
  NET_TRANSPORT_VALUES_IP_TCP,
  NET_TRANSPORT_VALUES_IP_UDP,
  ATTR_HTTP_FLAVOR,
  ATTR_HTTP_METHOD,
  ATTR_HTTP_REQUEST_CONTENT_LENGTH,
  ATTR_HTTP_RESPONSE_CONTENT_LENGTH,
  ATTR_HTTP_SCHEME,
  ATTR_HTTP_STATUS_CODE,
  ATTR_HTTP_TARGET,
  ATTR_HTTP_URL,
  ATTR_HTTP_USER_AGENT,
  ATTR_NET_HOST_NAME,
  ATTR_NET_HOST_PORT,
  ATTR_NET_PEER_NAME,
  ATTR_NET_PEER_PORT,
  ATTR_NET_TRANSPORT,
} from '@opentelemetry/semantic-conventions/experimental';

dyladan avatar May 09 '24 15:05 dyladan

/cc @trentm @JamieDanielson since you seemed interested in this

/cc @MSNev since you have done the most work on this recently

dyladan avatar May 09 '24 15:05 dyladan

I'll take a while to review this. I'm still trying to grok the generation, the semantic-conventions/model vs schemas/... subdirs, etc. Some early Qs/thoughts:

  • I gather merging the SEMRESATTRS_ and SEMATTRS_ groups is related to the "Problem" described at https://github.com/open-telemetry/semantic-conventions/issues/551 A hearty +1 to not using those prefixes. Did you consider also dropping the "ATTRS_" prefix? IIUC the Go semconv does not have any prefix on the exports from its semconv package. Java has namespacing of a different sort via the HttpAttributes part of import io.opentelemetry.semconv.HttpAttributes.
  • Similar to above, did you consider not having the METRIC_ prefix on metrics-related constants? (I don't see metrics-related values in open-telemetry/semantic-conventions-java.git and I'm not sure why. Does OTel Java not publish a package with metrics semconv constants?)

Correctness Qs:

  • Are you sure that the "deprecated" dirs in "semantic-conventions/model/..." handle all the deprecated values? For example http.resend_count was renamed to http.request.resend_count, but with your PR there is no deprecated HTTP_RESEND_COUNT entry.
  • http.client_ip is deprecated. There is a SEMATTRS_HTTP_CLIENT_IP but no ATTR_HTTP_CLIENT_IP, even though:
* @deprecated use ATTR_HTTP_CLIENT_IP
*/
export const SEMATTRS_HTTP_CLIENT_IP = TMP_HTTP_CLIENT_IP;

Same for SEMATTRS_DB_CASSANDRA_KEYSPACE, and I assume for others.

trentm avatar May 09 '24 21:05 trentm

The _VALUES_ fields are using the description of the field for which they are values as their comment, e.g.:

/**
 * The language of the telemetry SDK.
 */
export const TELEMETRY_SDK_LANGUAGE_VALUES_CPP = 'cpp';

/**
 * The language of the telemetry SDK.
 */
export const TELEMETRY_SDK_LANGUAGE_VALUES_DOTNET = 'dotnet';

/**
 * The language of the telemetry SDK.
 */
export const TELEMETRY_SDK_LANGUAGE_VALUES_ERLANG = 'erlang';

/**
 * The language of the telemetry SDK.
 */
export const TELEMETRY_SDK_LANGUAGE_VALUES_GO = 'go';

Can the description (is that the "brief" yaml field?) of the value be used, instead?

trentm avatar May 09 '24 21:05 trentm

Did you consider also dropping the "ATTRS_" prefix? ... Similar to above, did you consider not having the METRIC_ prefix

Yes I did consider that and I still would consider it if we want to go that route. It is my understanding that the semconv has decided to use a registry of unique attributes that can be applied to any signal or resource so there is no reason to differentiate them. I only kept the ATTR and METRIC prefix just to make it easier to find the value you want when autocompleting and not get confused.

Are you sure that the "deprecated" dirs in "semantic-conventions/model/..." handle all the deprecated values? For example http.resend_count was renamed to http.request.resend_count, but with your PR there is no deprecated HTTP_RESEND_COUNT entry.

I'm actually sure they're NOT all there. The deprecated.yaml didn't exist when many of these were removed and they weren't all added back. I left all the old versions in the file they were already in, so it isn't a breaking change, but I am going to add the missing attributes to the registry anyway (see https://github.com/open-telemetry/semantic-conventions/pull/1025)

Can the description (is that the "brief" yaml field?) of the value be used, instead?

Good catch. I'll update the PR

dyladan avatar May 10 '24 12:05 dyladan

Can the description (is that the "brief" yaml field?) of the value be used, instead?

Good catch. I'll update the PR

Unfortunately it looks like there aren't actually descriptions on the values themselves. I think the intellisense autocomplete looks ok anyway though:

image

dyladan avatar May 10 '24 12:05 dyladan

@trentm what about this?

image

dyladan avatar May 10 '24 12:05 dyladan

I ended up with something like this:

/**
 * Enum value 'created' for attribute {@link ATTR_ANDROID_STATE}.
 *
 * @experimental this attribute is experimental and is subject to change in minor releases of `@opentelemetry/semantic-conventions`.
 */
export const ANDROID_STATE_VALUES_CREATED = 'created';

Which looks like this and actually links back to its parent attribute

image

dyladan avatar May 10 '24 13:05 dyladan

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.04%. Comparing base (ecc88a3) to head (9bd9802). Report is 25 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4690   +/-   ##
=======================================
  Coverage   91.04%   91.04%           
=======================================
  Files          89       89           
  Lines        1954     1954           
  Branches      416      416           
=======================================
  Hits         1779     1779           
  Misses        175      175           

codecov[bot] avatar May 13 '24 12:05 codecov[bot]

For the main export import {} from '@opentelemetry/semantic-conventions' should ALL semconv be exported experimental and stable, or should only the stable be exported and experimental would be imported from @opentelemetry/semantic-conventions/experimental?

🤔 The benefit of keeping experimental attributes in /experimental subdirectory is that we are making it very explicit that it is an experimental attribute. I guess the downside is when the experimental attribute becomes stable, the consumer of that would have to update their code when they upgrade packages?

JamieDanielson avatar May 13 '24 15:05 JamieDanielson

So far this is looking really great, thanks @dyladan and thanks @trentm for the review so far.

I like ATTR better than SEMATTRS and SEMRESATTRS and wish I had realized this sooner and commented on the original PR that introduced them. I'm not clear what the full benefit is of having those other prefixes, although it may have been more relevant before they were in the global registry.

Exactly. Previously there was some chance (although probably it wouldn't have happened) that the same attribute could have been defined for different signals. I think the most reasonable way this could have happened would be for an attribute to have bounded specific values for metrics to control cardinality, but be unbounded for other signals or resources.

I only kept the ATTR and METRIC prefix just to make it easier to find the value you want when autocompleting and not get confused.

I'm not sure I understand the value of having ATTR prefix for attributes but no prefix for values. In that case I'd think they could be prefixed as well, or prefix neither.

The way we have it in this PR values have a postfix (actually an infix between the enum name and the value name). It provides separation between the enum name and the value name so it is distinguishable easily. For example, HOST_TYPE_LINUX is less obvious to me than HOST_TYPE_VALUE_LINUX where it is clear that LINUX is the value for the HOST_TYPE enum (these are fake attributes I just made up to prove a point).

dyladan avatar May 13 '24 15:05 dyladan

For the main export import {} from '@opentelemetry/semantic-conventions' should ALL semconv be exported experimental and stable, or should only the stable be exported and experimental would be imported from @opentelemetry/semantic-conventions/experimental?

🤔 The benefit of keeping experimental attributes in /experimental subdirectory is that we are making it very explicit that it is an experimental attribute. I guess the downside is when the experimental attribute becomes stable, the consumer of that would have to update their code when they upgrade packages?

I guess Java has a separate package for experimental attributes - there's a semconv in instrumentation-api, and a semconv in instrumentation-api-incubator. Python also has semconv in incubating separate from semconv stable. Go seems to have it all in one.

JamieDanielson avatar May 13 '24 15:05 JamieDanielson

🤔 The benefit of keeping experimental attributes in /experimental subdirectory is that we are making it very explicit that it is an experimental attribute. I guess the downside is when the experimental attribute becomes stable, the consumer of that would have to update their code when they upgrade packages?

This is the definition of experimental... It also would force users to at least consider if they need to make a change. If the semconv attributes you're using change it might be good to force our users to acknowledge that by changing to the stable export. If they can get all from a single export they may never notice if something is renamed/deprecated.

I guess Java has a separate package for experimental attributes - there's a semconv in instrumentation-api, and a semconv in instrumentation-api-incubator. Python also has semconv in incubating separate from semconv stable. Go seems to have it all in one.

I think a single package with multiple entry points is roughly equivalent to having separate packages and less overhead. Go has all in one but they export each version separately so you have to do something to get the new semconv version.

dyladan avatar May 13 '24 16:05 dyladan

I guess Java has a separate package for experimental attributes

My understanding of the OTel Java team's recommendations/requirements is that they do not allow a stable instrumentation package to have a dependency on the instrumentation-api-incubating package. They instead suggest the instrumentation have a copy of the experimental attributes in its own package code. This means that a user of the (non-experimental) semconv package is never broken by a semver-minor update of the package.

I guess we could get the equivalent by either (a) never using the "../experimental" entry point in stable instrumentation packages, or (b) pinning the @opentelemetry/semantic-conventions dep to a particular minor in packages that do.

I think a single package with multiple entry points is roughly equivalent to having separate packages and less overhead.

Agreed.

Go has all in one but they export each version separately so you have to do something to get the new semconv version.

This PR beat me to an attempt to update the semconv package. FWIW, I had been considering having separate entry points for each semconv version. See https://github.com/open-telemetry/opentelemetry-js/issues/4572#issuecomment-2108928693 I'm not advocating that option over this PR, however.

trentm avatar May 13 '24 22:05 trentm

I ended up with something like this: [screenshot of intellisense for a _VALUES_ field]

Nice. That looks good.

I only kept the ATTR and METRIC prefix just to make it easier to find the value you want when autocompleting and not get confused.

My soft vote is for no prefixes. The way I thinking/expecting developers to use semconv values was to (a) have a semantic-conventions document open (e.g. https://opentelemetry.io/docs/specs/semconv/http/http-metrics/) and see a string (e.g. http.server.request.duration) and (b) then want to be able to import HTTP_SERVER_REQUEST_DURATION.

IIUC, autocomplete will show ATTR_HTTP_* and METRIC_HTTP_* values when typing HTTP so I think it is fine for autocomplete either way. Having the METRIC_ does help the developer that knows they are scoped to metrics stuff. ATTR_ feels out of place for non-metrics, non-logs stuff.

Another small reason is that I like the shorter names in code.

This is a soft vote though. I don't have a very strong reaction to ATTR_.

trentm avatar May 13 '24 22:05 trentm

The way we have it in this PR values have a postfix (actually an infix between the enum name and the value name)

I like the _VALUES_ infix. I'm not sure if reads better as _VALUE_ (singular).

trentm avatar May 13 '24 22:05 trentm

IIUC, autocomplete will show ATTR_HTTP_* and METRIC_HTTP_* values when typing HTTP so I think it is fine for autocomplete either way

yes.

Having the METRIC_ does help the developer that knows they are scoped to metrics stuff. ATTR_ feels out of place for non-metrics, non-logs stuff.

It helps the developers who don't have the semconv doc open, are just trying to find an attribute quick, and are scanning the autocomplete list. They can type http and quickly either filter or ignore the metric names out.

dyladan avatar May 14 '24 17:05 dyladan

What about providing a Schema URL value (or values)? My very limited understanding is that including a semconv schemaUrl in instrumentationScope is being suggested as a path for downstream observability systems to be able to handle semconv changes.

Go, with a separate explicit import for each semconv version, has a single SchemaURL export for each package (e.g. https://github.com/open-telemetry/opentelemetry-go/blob/main/semconv/v1.25.0/schema.go). Java has an array of recent ones: https://github.com/open-telemetry/semantic-conventions-java/blob/main/semconv/src/main/java/io/opentelemetry/semconv/SchemaUrls.java

This could be handled separately, as well.

trentm avatar May 14 '24 22:05 trentm

What about providing a Schema URL value (or values)? My very limited understanding is that including a semconv schemaUrl in instrumentationScope is being suggested as a path for downstream observability systems to be able to handle semconv changes.

Go, with a separate explicit import for each semconv version, has a single SchemaURL export for each package (e.g. https://github.com/open-telemetry/opentelemetry-go/blob/main/semconv/v1.25.0/schema.go). Java has an array of recent ones: https://github.com/open-telemetry/semantic-conventions-java/blob/main/semconv/src/main/java/io/opentelemetry/semconv/SchemaUrls.java

This could be handled separately, as well.

I think I'd rather handle schema url as a separate addition later

This is a significant usage change for this package. It would be nice to have more prose in the changelog about the change. It would also be good to update the README example and perhaps explain the ATTR_, METRIC_ and VALUES usage.

I agree I'll add more detail

Do you actually get autocomplete in VSCode in this statement? I don't.

Yes I do. I hit ctrl+enter to bring up the autocomplete list.


I'll go through the more detailed review responses later today

dyladan avatar May 15 '24 11:05 dyladan

Do you actually get autocomplete in VSCode in this statement? I don't.

Yes I do. I hit ctrl+enter to bring up the autocomplete list.

Thanks for teaching me my editor. :) For me it is Control+Space or Cmd+I (https://stackoverflow.com/questions/56143239/how-to-trigger-vs-code-intellisense-using-keyboard-on-os-x).

trentm avatar May 15 '24 21:05 trentm

@MSNev what would you think of combining _VALUES_ infix option in this pr with ENUM_ prefix? Would look something like this:

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const ENUM_LOG_IOSTREAM_VALUES_STDOUT = 'stdout';
export const ENUM_LOG_IOSTREAM_VALUES_STDERR = 'stderr';

edit: I like someone's suggestion above to singular _VALUE_ instead of plural _VALUES_ so I'd probably make that change

dyladan avatar May 22 '24 20:05 dyladan

Reverting to draft while we wait on a resolution from https://github.com/open-telemetry/semantic-conventions/issues/1031

dyladan avatar May 23 '24 12:05 dyladan

Interestingly, perhaps, I just noticed that the recently updated Python semconv generation appends _TEMPLATE to the const name if the field is type: template[string[]]. So, for example, http.request.header:

      - id: request.header
        stability: stable
        type: template[string[]]
        brief: >
          HTTP request headers, `<key>` being the normalized HTTP Header name (lowercase), the value being the header values.

Is HTTP_REQUEST_HEADER_TEMPLATE and not HTTP_REQUEST_HEADER https://github.com/open-telemetry/opentelemetry-python/blob/8b80a28e825b102417eceb429f64d5ce52f3c2e7/scripts/semconv/templates/semantic_attributes.j2#L24

trentm avatar May 23 '24 22:05 trentm

Interestingly, perhaps, I just noticed that the recently updated Python semconv generation appends _TEMPLATE to the const name if the field is type: template[string[]]. So, for example, http.request.header:

      - id: request.header
        stability: stable
        type: template[string[]]
        brief: >
          HTTP request headers, `<key>` being the normalized HTTP Header name (lowercase), the value being the header values.

Is HTTP_REQUEST_HEADER_TEMPLATE and not HTTP_REQUEST_HEADER https://github.com/open-telemetry/opentelemetry-python/blob/8b80a28e825b102417eceb429f64d5ce52f3c2e7/scripts/semconv/templates/semantic_attributes.j2#L24

Yeah we're actually ignoring those for now. I was going to add them in a follow-up because they are handled differently

dyladan avatar May 24 '24 13:05 dyladan

@MSNev does the limitation on attribute values not being the same as attribute namespaces mentioned by @lmolkova in https://github.com/open-telemetry/semantic-conventions/issues/1064 ease your concerns with enum names? There should be no collision.

dyladan avatar May 29 '24 15:05 dyladan

@MSNev does the limitation on attribute values not being the same as attribute namespaces mentioned by @lmolkova in open-telemetry/semantic-conventions#1064 ease your concerns with enum names? There should be no collision.

If we want to go with the changing the names of the values to full screaming snake case rather than the existing <attribute name screaming>_<value name as screaming snake case> (eg other languages use camel case for the names) then as we are already prefixing all attributes with ATTR_ I think the better option would be your option 3 and use a more generic prefix (as they may not necessarily be considered to be enums) like VAL_ or VALUE_

so

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const LOGIOSTREAM_STDOUT = 'stdout';
export const LOGIOSTREAM_STDERR = 'stderr';

becomes

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const VAL_LOG_IO_STREAM_STDOUT = 'stdout';
export const VAL_LOG_IO_STREAM_STDERR = 'stderr';

This way there would always be zero chance of any conflict, vs the infix option. This would also work for whatever the outcome of the client.id / client_id resolution will be (which looks like the recommendation will be '_' -> '__', with the final option up to each language as not all languages use snake case.

Personally, I prefer the existing (but I guess I'm a little biased as that was my original choice) to convert the CamelCased values classes to the combination to avoid clashes 😀

MSNev avatar May 29 '24 15:05 MSNev

General comment on this from in the description

All names are constants now. Removes requirement for all the weird type stuff (sorry @MSNev I know you spent a lot of time on that)

This was actually always the goal, the namespace "fun" was just part of the stepping stones to move forward and to try and keep the generated package as small as possible without just duplicating the string.

MSNev avatar Jun 05 '24 15:06 MSNev

General comment on this from in the description

All names are constants now. Removes requirement for all the weird type stuff (sorry @MSNev I know you spent a lot of time on that)

This was actually always the goal, the namespace "fun" was just part of the stepping stones to move forward and to try and keep the generated package as small as possible without just duplicating the string.

I decided to just duplicate it. I think not long from now we'll go 2.0 and remove the namespace fun entirely.

dyladan avatar Jun 05 '24 20:06 dyladan

then as we are already prefixing all attributes with ATTR_ I think the better option would be your option 3 [...] like VAL_ or VALUE_ [...]

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const VAL_LOG_IOSTREAM_STDOUT = 'stdout';
export const VAL_LOG_IOSTREAM_STDERR = 'stderr';

Comparing this to other options:

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const LOG_IOSTREAM_VALUES_STDOUT = 'stdout';
export const LOG_IOSTREAM_VALUES_STDERR = 'stderr';

or perhaps singular VALUE:

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const LOG_IOSTREAM_VALUE_STDOUT = 'stdout';
export const LOG_IOSTREAM_VALUE_STDERR = 'stderr';

I can understand the desire for a prefix (e.g. VAL_), given ATTR_ is a prefix (was there another reason for that preference?). However, I like that the "VALUE(S)_" string separates the attribute name and value in the latter options.

The infix _VALUES_ separation is more helpful with an enum value that includes a .. However that is very rare -- only the deprecated HTTP_FLAVOR enum values include a . in their IDs.

export const VAL_HTTP_FLAVOR_HTTP1_0 = '1.0' as const;
export const VAL_HTTP_FLAVOR_HTTP1_1 = '1.1' as const;
export const VAL_HTTP_FLAVOR_HTTP2_0 = '2.0' as const;

I have a slight preference for infix _VALUE_, but not a strong aversion to the other options.

trentm avatar Jun 05 '24 21:06 trentm

I have a slight preference for infix VALUE, but not a strong aversion to the other options.

I tend to agree with you

dyladan avatar Jun 06 '24 12:06 dyladan

@trentm wdyt about https://github.com/open-telemetry/opentelemetry-js/pull/4690/commits/2dce2eb6fec58bcebb78a293665797f9901ab63e

dyladan avatar Jun 06 '24 12:06 dyladan