opentelemetry-java icon indicating copy to clipboard operation
opentelemetry-java copied to clipboard

Opt-in instead of opt-out for ProcessResourceProvider autoconfig

Open stnor opened this issue 4 years ago • 18 comments

Is your feature request related to a problem? Please describe. There is A LOT of static data sent in each trace with the otel.resource-prefix.

Describe the solution you'd like I'd like for this to be OPT-IN rather than opt out. I'm thinking most users wouldn't want this static info collected in EACH trace.

Describe alternatives you've considered The collection can be disabled.

https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/autoconfigure/src/main/java/io/opentelemetry/sdk/autoconfigure/OpenTelemetrySdkAutoConfiguration.java#L79

Additional context

        "otel.resource.telemetry.sdk.name": "opentelemetry",
        "otel.resource.process.runtime.description": "Amazon.com Inc. OpenJDK 64-Bit Server VM 11.0.11+9-LTS",
        "otel.resource.process.runtime.version": "11.0.11+9-LTS",
        "otel.resource.process.command_line": "/usr/lib/jvm/java-11-amazon-corretto:bin:java --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED -Djava.util.logging.config.file=/opt/nomp/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -javaagent:/opt/aws-opentelemetry-agent-1.1.0-nomp.jar -Dotel.resource.attributes=service.name=Nomp -Dotel.instrumentation.jedis.enabled=false -Dotel.instrumentation.spring-web.enabled=false -Dotel.traces.sampler=nomp -Denv=prod -Dfile.encoding=utf8 -Dnomp.loglevel=INFO -XX:+AlwaysPreTouch -Xms2500m -Xmx2500m -XX:MetaspaceSize=200m -XX:MaxMetaspaceSize=250m -XX:+DisableExplicitGC -XX:-OmitStackTraceInFastThrow -XX:+UseCompressedOops -XX:+UseG1GC -XX:ParallelGCThreads=3 -XX:ConcGCThreads=2 -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -Djava.security.egd=file:/dev/urandom -Djava.locale.providers=COMPAT,CLDR,SPI -DWEB_SOCKET_DISABLED=false -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -Dcatalina.base=/opt/nomp -Dcatalina.home=/opt/nomp -Djava.io.tmpdir=/opt/nomp/temp",
        "otel.resource.telemetry.auto.version": "1.1.0-nomp-aws",
        "otel.resource.process.executable.path": "/usr/lib/jvm/java-11-amazon-corretto:bin:java",
        "otel.resource.telemetry.sdk.version": "1.1.0",
        "otel.resource.os.description": "Linux 4.14.225-168.357.amzn2.x86_64",
        "otel.resource.os.type": "LINUX",
        "otel.resource.process.runtime.name": "OpenJDK Runtime Environment",
        "otel.resource.telemetry.sdk.language": "java",
        "otel.resource.cloud.provider": "aws",

stnor avatar May 18 '21 08:05 stnor

Thanks for this suggestion @stnor. I think it's a reasonable one. Do you have any ideas about what the API might look like for this, but still support users who would like the full-boat of resource information?

jkwatson avatar May 18 '21 16:05 jkwatson

This is the current opt-out mechanism afaik: -Dotel.java.disabled.resource.providers=io.opentelemetry.sdk.extension.resources.ProcessResourceProvider

so perhaps

-Dotel.java.enabled.resource.providers=io.opentelemetry.sdk.extension.resources.ProcessResourceProvider

to enable it again if you're asking about the Provider as a whole. Perhaps the static/verbose stuff can move to a separate Provder, and that can be enabled via the mechanism above.

I don't really know enough otel-java to come up with meaningful syntax for just turning off the meaningless (imho) stuff. Wouldn't an option just be to remove the code? Removed the from the spec even. I don't really understand the point of having these logged in traces, ever.

stnor avatar May 18 '21 19:05 stnor

This is the current opt-out mechanism afaik: -Dotel.java.disabled.resource.providers=io.opentelemetry.sdk.extension.resources.ProcessResourceProvider

so perhaps

-Dotel.java.enabled.resource.providers=io.opentelemetry.sdk.extension.resources.ProcessResourceProvider

to enable it again if you're asking about the Provider as a whole. Perhaps the static/verbose stuff can move to a separate Provder, and that can be enabled via the mechanism above.

I don't really know enough otel-java to come up with meaningful syntax for just turning off the meaningless (imho) stuff. Wouldn't an option just be to remove the code? Remove the from the spec even.I don't really understand the point of having these logged in traces, ever.

If you use OTLP it isn't in the traces, but in the Resource. What exporter are you using that puts every resource attribute onto every span?

jkwatson avatar May 18 '21 19:05 jkwatson

From my use of the Jaeger gRPC exporter, the Resource information is on every span as an attribute.

Should that not be the case?

kenfinnigan avatar May 18 '21 19:05 kenfinnigan

I am using a fork of the AWS OTEL-agent distro (to add my own Sampler) with their Docker sidecar to forward event to AWS X-Ray. The otel.resource attributes are there on every root span.

stnor avatar May 18 '21 19:05 stnor

From my use of the Jaeger gRPC exporter, the Resource information is on every span as an attribute.

Should that not be the case?

As I read the code, for Jaeger, the resource attributes are on the Jaeger Process, not on every span. https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/jaeger/src/main/java/io/opentelemetry/exporter/jaeger/JaegerGrpcSpanExporter.java#L152-L168

jkwatson avatar May 18 '21 19:05 jkwatson

Apologies @jkwatson, I confused myself because in the Jaeger UI it includes them, which makes sense. Agree it's not set on each span, just visualized that way

kenfinnigan avatar May 18 '21 19:05 kenfinnigan

So it it possible that in the AWS-case, its also just presented as such? @anuraaga

stnor avatar May 18 '21 19:05 stnor

I am using a fork of the AWS OTEL-agent distro (to add my own Sampler) with their Docker sidecar to forward event to AWS X-Ray. The otel.resource attributes are there on every root span.

That sounds like it might be an issue with AWS rather with OpenTelemetry. The Resource information is extremely important to identify the source of the produced spans. It's only "static" for a single instance, but across a whole fleet of instances, being able to distinguish between different versions, OS's, JDKs, SDKs, etc is generally very useful. What information do you think isn't important to know about the source of the spans?

jkwatson avatar May 18 '21 19:05 jkwatson

(FYI, my own opinion is that the full command-line should be opt-in, but the rest seem very good to me in general)

jkwatson avatar May 18 '21 19:05 jkwatson

What information do you think isn't important to know about the source of the spans?

I already know everything that's in the resource attributes I described above. I bet most people know they run Java on Linux, or that they run OTEL for example. :)

stnor avatar May 18 '21 19:05 stnor

What information do you think isn't important to know about the source of the spans?

I already know everything that's in the resource attributes I described above. I bet most people know they run Java on Linux for example. :)

And if your SREs decided to update the version of linux you were on, and it caused degraded performance, you'd probably want to be able to see the difference between the behaviors, wouldn't you?

jkwatson avatar May 18 '21 19:05 jkwatson

Yeah. perhaps in a larger operation than I run there is a point to that. What about these?

 "otel.resource.cloud.provider": "aws",
 "otel.resource.telemetry.sdk.name": "opentelemetry",
 "otel.resource.telemetry.sdk.language": "java",

My poInt: think it's just pretty verbose i general.

I could make the case for performance degradation by changing the java command -D flags for example, so its not like that is unimportant.

stnor avatar May 18 '21 19:05 stnor

Well, having operated at a company with a multi-cloud and polyglot language system, those definitely can be useful attributes for sure.

The long and short is that there isn't a one-size-fits-all solution to what Resource information is going to be useful to everyone. I do think it's fine to provide an option to disable all resource providers, but we definitely shouldn't go and remove all the Resource identification systems from the specification. The things are there because they were deemed by someone to be useful to them.

jkwatson avatar May 18 '21 19:05 jkwatson

Yeah, that's fair. I was concerned with them being in every span. Need to validate that with the AWS folks.

stnor avatar May 18 '21 19:05 stnor

@anuraaga and I chatted about this offline and propose this:

  • Every ResourceProvider should have a "name", akin to the names for exporters, samplers, etc.
  • You then do one of 3 things:
    • don't specify anything extra, get all the ResourceProviders
    • -Dotel.java.disabled.resource.providers=process,os to exclude the "process" and "os" named ResourceProviders
    • -Dotel.java.enabled.resource.providers=process,os to only include the "process" and "os" named ResourceProviders

Does that sound like a viable solution, @stnor ? If so, are you interested in implementing it? ;)

jkwatson avatar May 18 '21 23:05 jkwatson

@anuraaga and I chatted about this offline and propose this:

  • Every ResourceProvider should have a "name", akin to the names for exporters, samplers, etc.

  • You then do one of 3 things:

    • don't specify anything extra, get all the ResourceProviders
    • -Dotel.java.disabled.resource.providers=process,os to exclude the "process" and "os" named ResourceProviders
    • -Dotel.java.enabled.resource.providers=process,os to only include the "process" and "os" named ResourceProviders

Does that sound like a viable solution, @stnor ? If so, are you interested in implementing it? ;)

Hi, as discussed we want to have two flags right one for enabling and one for disabling, what should be the behavior if the user by mistake adds both the flag? Ex: Case 1:

-Dotel.java.disabled.resource.providers=process
-Dotel.java.enabled.resource.providers=os

Case 2:

-Dotel.java.disabled.resource.providers=os
-Dotel.java.enabled.resource.providers=os

Practically adding both flags doesn't make sense but what if someone adds by mistake?

Gaurang-Patel avatar Sep 08 '21 04:09 Gaurang-Patel

My vote would be to fail fast if both options are specified.

jkwatson avatar Sep 08 '21 04:09 jkwatson

Resolved in #4231.

jack-berg avatar Oct 03 '22 17:10 jack-berg