semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

[general, attributes] Make the `thread.id` attribute a string

Open alevenberg opened this issue 2 years ago • 24 comments
trafficstars

Context: https://opentelemetry.io/docs/specs/semconv/general/attributes/#general-thread-attributes

In C++, there is no guaranteed integral conversion of std::this_thread::get_id() (https://github.com/googleapis/google-cloud-cpp/pull/13151/files#r1397504258)

So we are storing the attribute as a string in our implementation. Can we update the spec to make the attribute value a string or an int?

alevenberg avatar Nov 20 '23 21:11 alevenberg

cc @open-telemetry/cpp-approvers

trask avatar Nov 21 '23 00:11 trask

Friendly ping @open-telemetry/cpp-approvers

joaopgrassi avatar Feb 28 '24 16:02 joaopgrassi

In C++, there is no guaranteed integral conversion of std::this_thread::get_id()

This is correct, and applicable to both C++ and Rust. The core issue is that there isn't a universally safe and portable method to convert thread identifiers into a numeric (integer) form due to concerns around type safety. This contrasts with higher-level languages that operate atop a managed runtime environment. These higher-level languages can afford to abstract away the complexities and offer a more simplified, integer-based representation of thread identifiers.

For applications written in C++ or Rust, the realistic strategies boil down to a couple of workarounds:

  • Transforming the thread identifier into a hash and using that as a unique identifier, though this comes with the usual caveats about potential hash collisions.
  • Setting up a custom mapping between std::thread::id and integers, safeguarded by a mutex for thread safety.

However, both of these methods introduce extra complexity and potential performance hits, making them less than ideal solutions.

Better approach would be of representing thread IDs as strings for semantic purposes.

cc @open-telemetry/rust-maintainers @open-telemetry/cpp-maintainers

lalitb avatar Feb 28 '24 17:02 lalitb

@lalitb what do thread ids look like in C++/Rust?

@AlexanderWert @trisch-me have you run into this in ECS? it looks like ECS process.thread.id is also a number type

trask avatar Feb 28 '24 22:02 trask

what do thread ids look like in C++/Rust?

In C++, thread id is an abstraction over the internal platform-specific representation or implementation of a thread, and it may or may not be an integer. The C++ Standard does not mandate std::thread::id to be a numeric value; as long as it is unique, can be converted to string for debugging purposes, and can be compared. As an example, the below stuff runs successfully for posix/gcc, but is not guaranteed to succeed for other compilers/platforms - https://godbolt.org/z/Eje4eWvcj

#include <thread>
#include <sstream>
#include <string>
#include <iostream>

int main() {
    std::thread::id threadId = std::this_thread::get_id();
    std::ostringstream oss;
    oss << threadId;
    std::string threadIdStr = oss.str();
    std::cout << "ThreadId as string: " << threadIdStr << std::endl;
    // below statement can throw exception for certain compilers/platforms if conversion fails
    std::cout << "ThreadId as long :" <<stol(threadIdStr);
}

o/p:
ThreadId as string: 140550312396608
ThreadId as long :140550312396608

Rust being more secure, unlike C++ it doesn't even provide a built-in, straightforward way to imply a numeric conversion from a ThreadId.

lalitb avatar Feb 28 '24 23:02 lalitb

The C++ Standard does not mandate std::thread::id to be a numeric value; as long as it is unique, can be converted to string for debugging purposes, and can be compared.

Just to clarify, sometimes it will be a string that represents a number, e.g. "140550312396608", and sometimes it will be a string that doesn't represent a number, e.g. "bZUPbvaY4d"?

trask avatar Feb 28 '24 23:02 trask

Just to clarify, sometimes it will be a string that represents a number, e.g. "140550312396608", and sometimes it will be a string that doesn't represent a number, e.g. "bZUPbvaY4d"?

Thats correct.

lalitb avatar Feb 29 '24 00:02 lalitb

Other than it being a breaking change, changing thread.id from number to string seems ok.

I don't think the number-ness of the value is important (other than, again, of it being a breaking change).

@tedsuo @jack-berg @jsuereth any thoughts on the defacto stability status of thread.id?

trask avatar Feb 29 '24 01:02 trask

@mjwolf do you know anything about type of thread.id, if we had problems with number type before?

trisch-me avatar Feb 29 '24 10:02 trisch-me

Given the OTel definition of thread.id is "Current “managed” thread ID (as opposed to OS thread ID).", I don't have any objection to this being a string.

The ECS field usually refers to the OS thread ID, which should be a number. The OS thread ID is a number in all major OSs.

With OTel, I think there should be an additional attribute that is specifically for OS thread ID, with number type. It's a bit of a potential problem in ECS that the type of thread ID to use in the field is not well defined, and users could use it for both runtime or OS thread ID. I haven't seen heard of any actual problems related to it in ECS, but by having separate attributes in OTel, any problems could be avoided.

mjwolf avatar Feb 29 '24 11:02 mjwolf

but by having separate attributes in OTel, any problems could be avoided.

@mjwolf But what does that mean for ECS users? That would be breaking for them, no? (if we introduce a thread.os.id) 🤔

joaopgrassi avatar Feb 29 '24 15:02 joaopgrassi

Given the OTel definition of thread.id is "Current “managed” thread ID (as opposed to OS thread ID).", I don't have any objection to this being a string.

The ECS field usually refers to the OS thread ID, which should be a number. The OS thread ID is a number in all major OSs.

With OTel, I think there should be an additional attribute that is specifically for OS thread ID, with number type.

this is a great point. it sounds like it makes sense to have separate attributes for "managed" thread id and "os" thread id, since you may want to stamp both things onto the same telemetry.

@open-telemetry/semconv-system-approvers any thoughts?

trask avatar Feb 29 '24 23:02 trask

With OTel, I think there should be an additional attribute that is specifically for OS thread ID, with number type.

this is a great point. it sounds like it makes sense to have separate attributes for "managed" thread id and "os" thread id, since you may want to stamp both things onto the same telemetry.

std::this_thread::get_id() in C++ gives you the OS thread id, so there would be a similar problem with an integer thread.os.id.

If this breakage is acceptable for ECS, I'd advocate for staying with a single thread.id as a string. Although it's a number in all operating systems I know, there's no sense in applying number specific operations and comparisons to it (sums, averages, greater than ...).

pyohannes avatar Mar 01 '24 09:03 pyohannes

If this breakage is acceptable for ECS, I'd advocate for staying with a single thread.id as a string. Although it's a number in all operating systems I know, there's no sense in applying number specific operations and comparisons to it (sums, averages, greater than ...).

How are we going to ensure the "managed" thread id VS "os" thread id then? The current description clearly states "Current “managed” thread ID (as opposed to OS thread ID)". Will this be changed? And if so how we will be able to know from where the single attribute's value is derived (managed/os)?

ChrsMark avatar Mar 01 '24 10:03 ChrsMark

How are we going to ensure the "managed" thread id VS "os" thread id then?

If we're confident that we need to make this distinction, then a possible solution would be to stay with thread.id as number for "managed" threads (doesn't apply to C++ and Rust), and thread.os.id as a string (to satisfy C++ and Rust constraints).

A question to ECS folks: in your experience, is there a need to distinguish between managed and OS thread ids?

pyohannes avatar Mar 01 '24 12:03 pyohannes

The ECS field usually refers to the OS thread ID, which should be a number. The OS thread ID is a number in all major OSs.

Agreed, however, the C++ and Rust constraints stem from the fact that those languages (and their standard libraries) have to support exotic operation systems.

pyohannes avatar Mar 01 '24 13:03 pyohannes

If we're confident that we need to make this distinction, then a possible solution would be to stay with thread.id as number for "managed" threads (doesn't apply to C++ and Rust), and thread.os.id as a string (to satisfy C++ and Rust constraints).

I think this should be reversed? Managed thread ID should be strings, and OS thread ID should be number. But I agree they should have different types.

The C++ thread ID is the managed/higher-level ID, and AFAIK all OSs use numberic thread IDs. Also, as OS thread ID and process ID are very closely, it wouldn't make sense to me to have PID be number, and TID be string. The OTel PID field is already a number, and changing PID to string to match TID would be a larger breaking change. Unless anyone knows of an OS that actually uses non-numberic PID/TID, I think both should be a number type.

A question to ECS folks: in your experience, is there a need to distinguish between managed and OS thread ids?

I think it's important to have managed and OS thread ID as separate fields, as they are separate logical concepts. If it was only one field, an application that wanted to send both managed and OS thread ID in the same log wouldn't be able to include both. Also, if different applications use the different types in the single field, it will cause confusion and data integrity problems later, as it won't be possible to know what is actually stored in the field. With ECS, I haven't heard of these problems happening with thread ID specifically, but I have heard of these problems with other fields, so it would be best to avoid these problems with thread ID.

mjwolf avatar Mar 04 '24 19:03 mjwolf

But what does that mean for ECS users? That would be breaking for them, no? (if we introduce a thread.os.id)

Yes, this might be a breaking change for process.thread.id. We could also go the other direction. Make thread.id as OS id to not break ECS and add additional field for managed thread id.

trisch-me avatar Mar 04 '24 21:03 trisch-me

The C++ thread ID is the managed/higher-level ID, and AFAIK all OSs use numberic thread IDs.

I see. std::thread directly wraps OS threads in all cases I'm aware of, however, we aren't guaranteed to obtain the OS thread id, so it makes sense to classify those as "managed" threads.

I think this should be reversed? Managed thread ID should be strings, and OS thread ID should be number. But I agree they should have different types.

Makes sense.

@mjwolf @trisch-me Would you agree with the following approach:

  1. As thread.id is defined to hold a "managed" thread id, let's make it a string (which resolves this issue).
  2. Discuss the introduction of OS thread ids and compatibility with ECS in a separate issue.

pyohannes avatar Mar 05 '24 14:03 pyohannes

Hey @pyohannes let's do it then. So it seems we need to differentiate between OS and managed thread. In this case let thread.id to be a managed thread as it is now and change type to string

As for OS thread id - let's discuss it here

trisch-me avatar Mar 13 '24 17:03 trisch-me

Hey @pyohannes let's do it then. So it seems we need to differentiate between OS and managed thread. In this case let thread.id to be a managed thread as it is now and change type to string

Great, I opened #836 for this.

pyohannes avatar Mar 26 '24 14:03 pyohannes

Why not having semantic conventions for each runtime/language instead?

E.g. jvm.thread.id, jvm.thread.name for Java, go.goroutine.id for Go, python.asynciotask.name for Python, etc.

I would leave thread.id and thread.name for OS thread ID and name (https://github.com/open-telemetry/semantic-conventions/issues/812). This way we can solve the issue without making breaking "syntactic" changes in the scheme.

https://github.com/open-telemetry/semantic-conventions/pull/836#issuecomment-2023362943

pellared avatar Mar 27 '24 17:03 pellared

E.g. jvm.thread.id, jvm.thread.name for Java, go.goroutine.id for Go, python.asynciotask.name for Python, etc.

I would leave thread.id and thread.name for OS thread ID and name (https://github.com/open-telemetry/semantic-conventions/issues/812). This way we can solve the issue without making breaking "syntactic" changes in the scheme.

That's definitely a possible solution (not syntactically but semantically breaking).

As I wrote in https://github.com/open-telemetry/semantic-conventions/pull/836#issuecomment-2025446934, as we have breaking changes with either solution approach, we should aim at having them result in a comprehensive and consistent solution (that can ideally be declared stable) and avoid piece-meal breaking changes.

I'll bring it up in the next semconv SIG meeting, maybe it makes sense to pause work on this and make breaking changes when we are serious about stabilizing thread.* attributes.

pyohannes avatar Mar 28 '24 15:03 pyohannes

This was discussed in the semantic conventions SIG.

Given that thread.id is already widely used and that any of the proposed solutions would involve breaking changes, we postpone work on this for one until somebody has the cycles to propose, present, and discuss a coherent and comprehensive solution that encompasses both OS thread id and name and managed thread id and name and thus resolve both this issue and #812.

This should result in a solution that can then be declared stable.

pyohannes avatar Apr 15 '24 07:04 pyohannes