profiles: `profile_id` is required
I'm trying to understand why Profile::profile_id is required according to the spec. I see why one would assign a unique ID to a profile; yet, I don't see why this behavior should be enforced. I think the spec should leave this choice to the user, and simply assume two profiles not to be related if profile_id is not set.
The main reason to remove this from the spec is that generating good UUIDs is not trivial, especially if utility functions are not available in the standard library (e.g. C++11). Forcing people to fallback to bad UUIDs does more harm than good in this case.
Perhaps the ID can be generated by the collector? How do other signals handle it? I agree that it would be nice not to require the profiling implementations to generate a UUID.
@open-telemetry/profiling-maintainers @tigrannajaryan
I suppose the main use case for this field is safe idempotent retries on failed profile uploads? If yes, then making the field optional and auto-populated by the collector seems reasonable to me.
Perhaps the ID can be generated by the collector? How do other signals handle it? I agree that it would be nice not to require the profiling implementations to generate a UUID.
I could not find any similar concept in other signals, except for trace_id of course.
Perhaps the ID can be generated by the collector?
This may make the collector mandatory for a substantial number of folks.
I suppose receivers can always calculate a hash of the incoming data which should work for deduplication as well. It might just be a bit annoying in cases where the gRPC framework doesn't let you access the raw data before turning it into objects.
What would the path forward look like in this case? I could not find any reference to profile_id being required in opentelemetry-specification, except for the copy of profiles.proto in OTEP 0239.
Would a PR in opentelemetry-proto be sufficient to propose the change for more feedback?
I raised the discussion about this issue in the opentelemetry-specification SIG last Tuesday. People there generally agreed on the proposal to make profile_id optional, and that keeping it for deduplication makes sense.
Logs can be assigned an ID for deduplication, but it's an attribute rather than a field in the Protobuf definition. This is documented in semantic-conventions. So the question would be, do we make the field optional, or we make it an attribute for profiles in semantic conventions?
Let me know if I forgot something @jsuereth
What is the purpose of profile ID? Does it serve any purpose? Rather than making it optional, can it be deleted outright?
What is the purpose of profile ID? Does it serve any purpose? Rather than making it optional, can it be deleted outright?
#665 added a clarifying comment on when the field may be useful.
I think you refer to:
It may be used for deduplication and signal correlation purposes.
Can that be done via a resource attribute or a profile attribute then?
I think you refer to:
It may be used for deduplication and signal correlation purposes.Can that be done via a resource attribute or a profile attribute then?
Perhaps subjectively, but I think it's clearer and more explicit as a dedicated field. Similar to trace_id in trace signal. I don't have super strong opinion either way though about this particular field though.
That also implies profile_id is the correlation mechanism you will use. But how does that work in practice? How do you have twice the same profile ID? What correlation can you derive from 16 bytes?
Say you instead use service.instance.id per semantic convention to correlate - now that makes sense. Say you use k8s.pod.uid to correlate or dedup or aggregate - all those attributes are semantic convention attributes which are adopted across signals and allow correlation and coordination. Even adding trace_id as an attribute, similar to logs, is better here to map a trace to profiles.