opentelemetry-python
opentelemetry-python copied to clipboard
Record logger name as the instrumentation scope name
As spec'd here https://github.com/open-telemetry/opentelemetry-specification/pull/2359
A follow up question for this feature -
Currently LoggingHandler holds a single _logger object which uses a single scope name. Since the same handler may be used by multiple loggers with different names, it will require initiating new internal Logger objects on demand.
Is this a design change we are willing to make?
Correct, the broader agreement was to maintain a map of scope: logger and use them. It would be great if we could have benchmarking numbers and memory requirements.
Thanks @srikanthccv . Is there a reference for this decision? What kind of benchmarks do you mean?
I don't know if it's ever written somewhere in the issues, but there was a slack channel where this discussion happened. I can probably look it ups and link here.
Here are the benchmarks java did when it had to make this decision https://github.com/open-telemetry/opentelemetry-specification/pull/1236#issuecomment-984058622.
@tm0nk can you link the Java PR for this that you mentioned? I haven't been super involved in the Logging API/SDK but I'm a little surprised that a bridging API doesn't allow passing the instrumentation scope name directly in the LogRecord.
Would that be a possible implementation in our API instead of having to cache a Logger object that serve no purpose beyond holding the instrumentation scope? Any previous spec discussion?
@aabmass The analysis I mentioned for opentelemetry-java was done by Jack Berg, and he summarized it well in this comment: https://github.com/open-telemetry/opentelemetry-specification/pull/1236#issuecomment-984058622
This comment makes it sound like they chose the path of not using a cache at all.
However, the implementation of keeping one logger per instrumentation scope in the latest opentelemetry-java release (v1.37.0) does seem to be using a form of cache. It appears ComponentRegistry<SdkLogger> is being used for this purpose:
https://github.com/open-telemetry/opentelemetry-java/blob/v1.37.0/sdk/logs/src/main/java/io/opentelemetry/sdk/logs/SdkLoggerProvider.java#L63
Can you describe in more detail the approach you have in mind that implements this in a bridging API?
Will bring this up during the SIG meeting.
@tm0nk I think this Java code is analogous to this issue. It looks like your PR, they just get a new logger for each log entry. I think it makes sense to have caching in the API to avoid creating a bunch of garbage. I guess I was just suprised there is no escape hatch for this kind of thing. We have MetricProducer in the Metrics SDK which allows emitting a batch of metrics including instrumentation scope directly.
Hi @lzchen I was planning to pick this issue up to add the requested cache tests and benchmarks to the initial PR from @tm0nk. I noticed that an attributes argument was recently added added to InstrumentationScope and thus get_logger. https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/init.py#L655
attributes is an unhashable type (Mapping), so the original @lru_cache approach no longer works for caching. What is the expected behavior if a user has a Logger with the same instrumentation scope name but different attributes? Is it best to avoid caching entirely, or should we create a custom hash and caching logic that only uses some arguments (e.g. name, version, and schema_url).