[sql] Parse `db.operation.name` and `db.collection.name` from `db.query.text`
Component
OpenTelemetry.Instrumentation.SqlClient
What is the expected behavior?
The attributes db.operation.name and db.collection.name are conditionally required "if readily available".
Alternatively, they MAY be parsed from db.query.text. See relevant footnotes.
[3]: If readily available. The collection name MAY be parsed from the query text, in which case it SHOULD be the first collection name found in the query. [6]: If readily available. The operation name MAY be parsed from the query text, in which case it SHOULD be the first operation name found in the query.
These attributes are very valuable to end users.
Parsing db.query.text is a costly operation. What this issue does not attempt to address is whether parsing operation/collection from db.query.text should be on by default. For now, it should be implemented as opt-in, though later we should decide whether it should be on by default.
Ruby has an in-progress PR to add db.collection.name. Here is the regex:
# Capture the first word (including letters, digits, underscores, & '.', ) that follows common table commands
TABLE_NAME = /\b(?:(?:FROM|INTO|UPDATE)|(?:(?:CREATE|DROP|ALTER)\s+TABLE(?:\s+IF\s+(?:NOT\s+)?EXISTS)?))\s+["]?([\w.]+)["]?/i
Adding this note as more of a resource/ point of discussion when this ticket gets worked on. Might be a better way to grab this.
Parsing
db.query.textis a costly operation. What this issue does not attempt to address is whether parsing operation/collection fromdb.query.textshould be on by default. For now, it should be implemented as opt-in, though later we should decide whether it should be on by default.
@alanwest I think it would also be valuable to be able to override this from an application POV, perhaps with an API similar to SqlClientTraceInstrumentationOptions.Enrich(). The application may be able to better provide the db.operation.name (e.g. findAndModify as given in the examples here: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md#common-attributes) than what the SqlClient instrumentation code can determine on its own.
There's also the db.query.summary field (https://github.com/open-telemetry/opentelemetry-dotnet-contrib/issues/2238). Given that "The span name SHOULD be {db.query.summary} if a summary is available.", perhaps it would be even more important to be able to override this from user code (if the spec permits it). :thinking:
(My take on this: trying to move away from the span name on all trace data being just the database name. :sweat_smile: It doesn't look very good when looking at the data in an OpenTelemetry web UI)
Closing this issue because it is no longer necessary. Prior to stabilizing the DB conventions the db.query.summary attribute was introduced which better captures this information.
SQL instrumentation no longer sets uses db.collection.name or db.operation.name in most circumstances. There is no longer a requirement to parse them from the query text.