opentelemetry-ruby-contrib
opentelemetry-ruby-contrib copied to clipboard
Feat: Add Mysql2 and Trilogy `db.collection.name` attribute
The db.collection.name attribute is conditionally required for Database spans. This PR uses regex to detect the collection name and report it as the db.collection.name attribute for Mysql2 and Trilogy instrumentation.
Closes #1100
@hannahramadan Thank you for your patience waiting for my response.
We discussed some of the details during the SIG on 2024-10-08 which I will share here:
Default to omit
Adding additional regular expressions to extract data adds more overhead than I would like in high volume systems that are performance sensitive. I would like to make it so that users opt-in to this attribute.
Using Mixed Schema Versions
I think I mentioned this already in a separate comment, but I believe that our instrumentations are still emitting pre-1.0 schema attributes. I would have expected the attribute name to be db.sql.table and then once we had the DB Semconv Stability functionality it would emit db.collection.name and/or db.sql.table depending on the user selected option.
I am of the opinion it may be a bit more confusing in this case to have the db.statement attribute along with the db.collection.name, when users I think would expect to see db.query.text instead; so, I think it is best to align on using deprecated attributes until we introduce the use of semconv stability.
Is that not what we want?
Config options are tied to a Schema Version
I think we set a bad precedent with setting individual attribute options because the names are now tied to a specific schema version. I am not certain what to do about that.
Do we continue to have options when we move things along?
Remove Structural Duplication
Identical code is now going to appear in multiple instrumentations, which means that if we need to make changes it will require applying them to multiple locations.
I think that it would be best to extract this functionality into a common library that could be shared across DB instrumentations.
Though the gem is currently named sql-obfuscation I think it is the best place to include this functionality, where instead of it solely being responsible for sanitizing SQL, the gem could provide a helper that extracts or enriches a DB client span. It could use the instrumentation config to make decisions about what attributes to include and what logic to apply.
Something like this:
SqlProcessor.append_db_attributes(span, sql, config)
Given that this is more of a client side processor than it is a query sanitizer, then I think we should rename the gem to reflect that it is processing SQL and generating attributes for it.
An additional benefit to removing structural duplication is that it would reduce the number of places where we would have to add any DB attribute semconv stability compatibility.
Are Span Processors out of the question?
This is something we did not discuss.
With the introduction of OnEnding, the specification leaves room for us to do some Span Processing outside of the instrumentation code.
This would mean that instrumentations could all emit the SQL and then the SqlProcessor could be run in OnEnding and potentially reduce some of the complexity in the DB instrumentations.
The processor could extract attributes from the SQL and sanitize or omit the statements. This would change configurations so that these options could not be configured on the instrumentations anymore but rather in the span processor.
I think that trade off may be acceptable since it is unlikely we will want to have different configurations for different SQL datastores.
All that being said
We agreed that I would unblock this on the condition that this attribute be omitted by default and move forward with structural duplication and refactor the code in a future PR.
Thank you @arielvalentin and apologies for the delayed response here!
To your point, I think it may make sense to nail down how we want the design to look and then add this attribute.
I think that it would be best to extract this functionality into a common library that could be shared across DB instrumentations.
Perhaps we:
- Rename the
sql-obfuscationtosql-processor - Move the logic to collect table/collection name there. Attributes that require sql parsing logic will also live there
For table/collection name attribute:
- Default to omit - I agree this should be and opt-in attribute.
- Using Mixed Schema Versions - Also a good call out. Agree.
- Config options are tied to a Schema Version - We talked about potentially having levels, like collector internal metrics. I'm not sure there are enough configs/options to to warrant grouping into levels, but what do we think about a config that accepts any array of values to collect? Old and new semantic versions could be accepted and aliased.
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot
Since this PR was opened, guidelines were updated (https://github.com/open-telemetry/semantic-conventions/commit/3fa52acca3ec1530f9b88a38a29bfd876ec2205d) to NOT extract the collection name from db.query.text when the database system supports query text with multiple collections in non-batch operations. This PR is doing that and should be closed.