aspire
aspire copied to clipboard
Oracle OpenTelemetry support
Closes #4072 by updating to the new ODP.Net version and adding OTel support via Oracle.ManagedDataAccess.OpenTelemetry. As mentioned in the issue this will need a new package and bumped version adding to the NuGet feeds.
Used the tracing settings taken from the Oracle sample to ensure that EF Core SQL text shows is provided to the traces.
Added an EndToEnd test using the Npgsql one as a base.
Microsoft Reviewers: Open in CodeFlow
@mitchdenny or @danmoseley are you able to take a look at this?
Adding @eerhardt for the Otel side of things.
Importing the Otel package to internal feed so we can build this.
Importing the Otel package to internal feed so we can build this.
@mitchdenny or @eerhardt are you able to add Testcontainers.Oracle to the internal feed as well to get this compiling again.
@mitchdenny or @eerharor @eerhardt are you able to add Testcontainers.Oracle to the internal feed as well to get this compiling again.
Importing that package now. Should be there in 30 mins.
@eerhardt @sebastienros would you be able to take a look with the failing tests? I've got them working on my machine, and made some progress on working out what's needed on the them running on the build agent (adding the Helix reference helped) but it looks like the ones that use Docker are still all failing. I did adjust the TriggerActivity to try and be obvious about connection issues to the database, but it feels like this is something else.
@radical The Helix tests are failing on things related to this PR, can you sport something wrong with how the new tests are written?
Looking at the test results the first test fails with:
Oracle might be taking a longer time to start up, or failing at some point? I would try to add some waits to confirm that the db is up, and accessible. This might be helpful too.
Also, there is https://github.com/dotnet/aspire/issues/3161 but it doesn't use testcontainers, and instead depends on the AppHost starting the container.
/azp run
Azure Pipelines successfully started running 1 pipeline(s).
It looks like we aren't getting the database icon to show up in the dashboard:
compare that to SqlServerEndToEnd:
The reason appears to be because there is no peer.service attribute/tag on the span:
vs.
@JamesNK - is there a reason why the span has to have peer.service for the DB icon to show up? Can't it having db.system be enough?
peer.service indicates the span goes to an external app. The icon only appears in that situation.
The value of peer.service is used to map to a resource with the same address and the name is then included in the UI, e.g. sql1.
server.address and server.port also cause the icon to be displayed: https://github.com/dotnet/aspire/blob/09d5efb49e258fe54e15578ff3b37cbbabe48e79/src/Aspire.Dashboard/Otlp/Model/OtlpHelpers.cs#L195-L224
They're both recommend attributes in database trace semantic conventions - https://opentelemetry.io/docs/specs/semconv/database/database-spans/
I went through the different non-default settings of the OracleDataProviderInstrumentationOptions, and setting EnableConnectionLevelAttributes = true does provide:
That results in the addition of db.name, db.user, server.address and server.port being set on the spans. Goes back to the earlier discussion about whether we should be setting non-default values or not. If they're recommended values then that is perhaps a case for enabling that setting. Alternatively the approach I've got at the moment is allowing the user to inject settings and we mention this in the documentation, happy to take a steer either way.
Those conventions are useful in clarifying the issue around the SQL statement (SetDbStatementForText) and that it probably should be opt in if users are confident their SQL isn't exposing sensitive values.
Meanwhile back to trying to work out why TestContainers isn't working...
Looking at the test results the first test fails with:
Oracle might be taking a longer time to start up, or failing at some point? I would try to add some waits to confirm that the db is up, and accessible. This might be helpful too.
Also, there is #3161 but it doesn't use testcontainers, and instead depends on the AppHost starting the container.
Hi @radical after much bashing away at this I managed to unearth that the tests are failing with:
Oracle.ManagedDataAccess.Client.OracleException (0x80004005): ORA-50201: Oracle Communication: Failed to connect to server or failed to parse connect string
---> OracleInternal.Network.NetworkException (0x80004005): ORA-50201: Oracle Communication: Failed to connect to server or failed to parse connect string
---> OracleInternal.Network.NetworkException (0x80004005): ORA-12514: Cannot connect to database. Service XEPDB1 is not registered with the listener at host 127.0.0.1/127.0.0.1 port 5432. (CONNECTION_ID=c4G0SvajqUmwqX6okJZmQA==)
That ORA-12514 code led me to https://github.com/gvenzl/oci-oracle-xe/issues/225 where it seems there might be issues with different networking setups. I don't know anything about Helix - is it potentially doing something along the lines of network=host?
The wait strategy in place on the container and the output logs at the point of the exception show that the container db has reached a point where it should work. In previous runs I've tried adding infinite waits on fail that went all the way up to the test timeout with no luck either, so it shouldn't be a case of the database still coming up.
That results in the addition of db.name, db.user, server.address and server.port being set on the spans.
db.name and db.user aren't standard parameters. Making them opt-in makes sense.
server.address and server.port are part of the standard and is important information. I think the underlying Oracle library should include them by default. Until that happens, updating Aspire component to set configuration to include them is the next best option.
That results in the addition of db.name, db.user, server.address and server.port being set on the spans.
db.nameanddb.useraren't standard parameters. Making them opt-in makes sense.
server.addressandserver.portare part of the standard and is important information. I think the underlying Oracle library should include them by default. Until that happens, updating Aspire component to set configuration to include them is the next best option.
@alexkeh is adding the two server properties something you're able to raise?
@cmdkeen - Is setting EnableConnectionLevelAttributes = true enough to get all the OTel semantic convention attributes added to the trace? If yes, and if there are no security concerns with what information this setting enables, then setting that by default for now makes sense to me. We can remove it when the Oracle provider adds the OTel semantic convention attributes by default.
@eerhardt looks to be, from the playground with that property turned on results in this for the sendExecuteRequest span:
I'll make that the default for now, with a comment that it can be removed if/when the Oracle library supports.
In other happy news it looks like the Helix Docker tests are finally passing!
That results in the addition of db.name, db.user, server.address and server.port being set on the spans.
db.nameanddb.useraren't standard parameters. Making them opt-in makes sense.server.addressandserver.portare part of the standard and is important information. I think the underlying Oracle library should include them by default. Until that happens, updating Aspire component to set configuration to include them is the next best option.@alexkeh is adding the two server properties something you're able to raise?
@cmdkeen Let me check with my dev team.
@mitchdenny and @eerhardt anything more you want doing on this?
That results in the addition of db.name, db.user, server.address and server.port being set on the spans.
db.nameanddb.useraren't standard parameters. Making them opt-in makes sense.server.addressandserver.portare part of the standard and is important information. I think the underlying Oracle library should include them by default. Until that happens, updating Aspire component to set configuration to include them is the next best option.@alexkeh is adding the two server properties something you're able to raise?
@cmdkeen Let me check with my dev team.
@cmdkeen I got feedback from my dev team. There are a couple reasons ODP.NET OTel does not expose server.address and server.port by default.
-
Security - Organizations can be sensitive to have these two property values written to traces inadvertently and unknowingly. Having the user set
EnableConnectionLevelAttributesmakes it unambiguous. -
Other OTel DB provider behavior - We observed that Microsoft SqlClient provider doesn’t trace the equivalent
server.addressandserver.porttags by default either. It requiresEnableConnectionLevelAttributesset to true as well. Its default value is false. That applies to the MySQL provider as well.
Other OTel DB provider behavior - We observed that Microsoft SqlClient provider doesn’t trace the equivalent server.address and server.port tags by default either. It requires EnableConnectionLevelAttributes set to true as well. Its default value is false.
When EnableConnectionLevelAttributes is not enabled, Microsoft.SqlClient will emit peer.service though. See this code.
internal void AddConnectionLevelDetailsToActivity(string dataSource, Activity sqlActivity)
{
if (!this.EnableConnectionLevelAttributes)
{
sqlActivity.SetTag(SemanticConventions.AttributePeerService, dataSource);
}
else
{
...
That applies to the MySQL provider as well.
For the MySqlConnector library (which we have an Aspire component for), the reason the icon shows up is because MySqlConnector adds the net.peer.name and net.peer.port tags to the Activity.
And then the OTLP exporter appends peer.service to the tag list, using the above 2 tags.
That applies to the MySQL provider as well.
For the MySqlConnector library (which we have an Aspire component for), the reason the icon shows up is because MySqlConnector adds the
net.peer.nameandnet.peer.porttags to the Activity.And then the OTLP exporter appends
peer.serviceto the tag list, using the above 2 tags.
The Oracle provider doesn't have the peer.service attribute. If we support this attribute, we'll plan to emit it by default.
@eerhardt given the clarification above do you have a steer on what needs doing to get this merged?
- Back to the default
EnableConnectionLevelAttributes- for now devs will have to manually choose whether to enable. In time the Oracle library might addpeer.servicesupport. - Stick with
EnableConnectionLevelAttributes=trueby default and document that it is configurable noting the security concern
Happy either way - suspect given the security element that 1 is the sensible choice.
suspect given the security element that 1 is the sensible choice.
Yeah, that is where I'm leaning. There are reasons this is the default from Oracle, following suite is the better choice.
Oracle might be taking a longer time to start up, or failing at some point? I would try to add some waits to confirm that the db is up, and accessible. This might be helpful too. Also, there is #3161 but it doesn't use testcontainers, and instead depends on the AppHost starting the container.
Hi @radical after much bashing away at this I managed to unearth that the tests are failing with:
Oracle.ManagedDataAccess.Client.OracleException (0x80004005): ORA-50201: Oracle Communication: Failed to connect to server or failed to parse connect string ---> OracleInternal.Network.NetworkException (0x80004005): ORA-50201: Oracle Communication: Failed to connect to server or failed to parse connect string ---> OracleInternal.Network.NetworkException (0x80004005): ORA-12514: Cannot connect to database. Service XEPDB1 is not registered with the listener at host 127.0.0.1/127.0.0.1 port 5432. (CONNECTION_ID=c4G0SvajqUmwqX6okJZmQA==)That ORA-12514 code led me to gvenzl/oci-oracle-xe#225 where it seems there might be issues with different networking setups. I don't know anything about Helix - is it potentially doing something along the lines of network=host?
Sorry, I missed your message earlier. Do IIUC that the issue is fixed for the testcontainers case, but you are still seeing the above with EndToEnd tests which depend on the test app starting the containers?
Is it potentially doing something along the lines of network=host?
And is this something that would hinder it? I'm not sure if this is used when starting the container. https://docs.docker.com/network/drivers/host/ says that network=host would mean the container's network is not isolated from the host itself. I would guess that it shouldn't be using this, but I'm not sure.
Is there some way to figure that out from within the container?
Hi @radical - thanks for getting back. That networking issue never popped up for the EndToEnd tests in playground as they run locally. I managed to fix it for the Helix tests by using .WithHostname("localhost") so the tests are working. Without a detailed understanding of how Helix is configuring the networking settings I didn't want to dig further into what the problem might be. TBH if it's working now perhaps best to let it lie, at least as long as it seems to be a problem specific to Oracle TestContainer.
Hi @sebastienros and @eerhardt - apologies for leaving this, it's holiday season in Europe, though no-one has told the weather here... Have reverted back to the default ODP.Net tracing options, while providing the option to specify if users do want the extra detail and are happy with the security trade-off in their environment. Assuming the builds all go green this should be in a position for a code review.
