semantic-conventions
semantic-conventions copied to clipboard
Adjust `process.command_args` and `process.command_line` to be opt-in
By default, auto instrumentation ships process.command_args, which is very dangerous as a lot of java services pass in secrets via command line arguments. Example:
java \
-Dkeycloak.clientSecret="${KEYCLOAK_SECRET:-test}" \
-jar app.jar
I have opened an issue against the java auto instrumentation repo (https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/10151), but I was pointed to https://opentelemetry.io/docs/specs/semconv/resource/process/ which indicates that the information I am asking to be made opt-in is marked "Conditionally Required" in the specification.
Does it make sense to give an out for languages where passing in secrets via command line arguments is common? Curious also if other languages have this problem and how they deal with it.
@open-telemetry/semconv-system-approvers FYI
I am in favor of making these opt-in
@inssein would you like to send a PR to address it?
@joaopgrassi happy to, but I just have a quick clarifying question.
For the container resource, comand_line and command_args are both opt_in. Are we good with doing that to the process resource, or just command_args?
cc @breedx-splk since you had thoughts in https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/10151#issuecomment-1967818321
cc @breedx-splk since you had thoughts in open-telemetry/opentelemetry-java-instrumentation#10151 (comment)
Thanks @trask. Yeah, I don't love the motivation here...tho I'm probably outnumbered.
We already jump through so many hoops to mask and hide information that users shouldn't be exposing in the first place, and I think that can come at the expense of troubleshooting utility. Like especially in the java world, where configuration items are often passed on the commandline via system properties. Not being able to see an agent's configuration (via commandline) after the fact is frustrating....and I use the commandline to help troubleshoot quite often.
In the other thread, there is also mention of size/length/waste, which I think is maybe a stronger reason for considering making these opt-in (off by default). I also pointed out that I think this is probably fine, since users and vendors can always choose to override it.
From 2024-10-03 system semantic conventions we all agree that we should make this opt-in, we just need to implement this
Checking the attempt made by https://github.com/open-telemetry/semantic-conventions/pull/789 for this, I wonder if there is consensus here for this change. Right now the following attributes are conditionally required:
- process.executable.name
- process.executable.path
- process.command
- process.command_line
- process.command_args
If any of those is used the rest are optional if I understand this correctly. If we make the process.command* ones opt-in then we would need to require the process.executable.* ones? More or less what @Oberon00 mentioned at https://github.com/open-telemetry/semantic-conventions/pull/789#issuecomment-2011333274.
On the other hand it should be fine if we just keep process.command as is and only change the process.command_args and process.command_line to cover what the issue originally proposed.
@open-telemetry/semconv-system-approvers @open-telemetry/semconv-security-approvers any thoughts here?
Bumping this up again.
Also reading through https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/10151 I see the rationale of having process.command_line and process.command_args as opt-in so as to avoid confusion. process.executable.name, process.executable.path and process.command can still remain as is, conditionally_required.
@open-telemetry/semconv-system-approvers if there is no objection on the above I will file a PR to apply this change. Otherwise, if there are reasons for them to remain as is we will need to make this decision and close this issue since it's a ga blocker.
Another option could be something like db.query.text:
SHOULD NOT be collected by default unless there is sanitization that excludes sensitive data
Another option could be something like
db.query.text:SHOULD NOT be collected by default unless there is sanitization that excludes sensitive data
I would be in favor of going with this option instead of changing the requirement level.
I am in favor of this wording as well