azure-sdk-for-java icon indicating copy to clipboard operation
azure-sdk-for-java copied to clipboard

[BUG] The OpenTelemetry INTERNAL span reported BlobClientBase.exists() is marked as an error and attaches an exception stack trace

Open trask opened this issue 1 year ago • 3 comments

Calling BlobClientBase.exists() produces two spans (which is expected):

an INTERNAL span:

SpanData{spanContext=ImmutableSpanContext{traceId=92c3d060568c79c1fafbc818297ea4af, spanId=b98cca7f7c875f09, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=00000000000000000000000000000000, spanId=0000000000000000, traceFlags=00, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=false}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.42.1"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=azure-storage-blob, version=12.28.0, schemaUrl=https://opentelemetry.io/schemas/1.17.0, attributes={}}, name=AzureBlobStorageBlob.getPropertiesNoCustomHeaders, kind=INTERNAL, startEpochNanos=1729222313878000000, endEpochNanos=1729222319944232700, attributes=AttributesMap{data={thread.id=1, thread.name=main, az.namespace=Microsoft.Storage}, capacity=128, totalAddedValues=3}, totalAttributeCount=3, events=[ImmutableExceptionEventData{epochNanos=1729222319942186100, exception=com.azure.storage.blob.implementation.models.BlobStorageExceptionInternal: Status code 404, ContainerNotFound, additionalAttributes={}, spanLimits=SpanLimitsValue{maxNumberOfAttributes=128, maxNumberOfEvents=128, maxNumberOfLinks=128, maxNumberOfAttributesPerEvent=128, maxNumberOfAttributesPerLink=128, maxAttributeValueLength=2147483647}}], totalRecordedEvents=1, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=}, hasEnded=true}

and a nested CLIENT span:

SpanData{spanContext=ImmutableSpanContext{traceId=92c3d060568c79c1fafbc818297ea4af, spanId=0bc936d658ee0899, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=92c3d060568c79c1fafbc818297ea4af, spanId=b98cca7f7c875f09, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, resource=Resource{schemaUrl=null, attributes={service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.42.1"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=azure-storage-blob, version=12.28.0, schemaUrl=https://opentelemetry.io/schemas/1.17.0, attributes={}}, name=HEAD, kind=CLIENT, startEpochNanos=1729222313971242900, endEpochNanos=1729222316895604300, attributes=AttributesMap{data={applicationinsights.internal.operation_name=AzureBlobStorageBlob.getPropertiesNoCustomHeaders, http.request.resend_count=1, http.url=https://trasktest.blob.core.windows.net/test/test, thread.id=1, az.client_request_id=98c0ae84-0dd9-457d-99fa-fb31220c71aa, az.service_request_id=f0b0a3d5-001e-0018-6a0e-2176da000000, server.port=443, http.method=HEAD, thread.name=main, server.address=trasktest.blob.core.windows.net, http.status_code=404, az.namespace=Microsoft.Storage}, capacity=128, totalAddedValues=12}, totalAttributeCount=12, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=ERROR, description=404}, hasEnded=true}

isn't not surprising that the CLIENT span has status ERROR, since it's probably captured by lower-level HTTP instrumentation which doesn't know that a 404 is an expected response code for this operation

what's surprising is that the INTERNAL span has status ERROR and attaches (an oftentimes large) exception stacktrace, even though the call to BlobClientBase.exists() doesn't throw an exception but instead just returns false when the blob is not found.

repro at https://github.com/trask/azure-blob-storage-test

cc @lmolkova @jeanbisutti @heyams @harsimar

trask avatar Oct 18 '24 03:10 trask

@ibrahimrabab @ibrandes @kyleknap @seanmcc-msft

github-actions[bot] avatar Oct 18 '24 03:10 github-actions[bot]

Thank you for your feedback. Tagging and routing to the team member best able to assist.

github-actions[bot] avatar Oct 18 '24 03:10 github-actions[bot]

Switching ownership to Core as Storage doesn't do anything special with span creation.

exists(), and the equivalent APIs that are speculative, have handling on the client side to return a better response when possible. For example, Storage Blob doesn't have a specific API for checking existence of a blob (or container) so an attempt on getProperties is made with catching if a 404 is returned to indicate false. So, from the REST API perspective this did fail but from an application perspective it didn't, which may be something that needs to be dug into as tracing and runtime are reporting different results.

alzimmermsft avatar Oct 18 '24 17:10 alzimmermsft

This is a false positive, right? How to prevent the error from appearing in the logs? I can't really ignore the whole namespace com.azure.storage.blob.implementation.models.*.

Is it safe to hide com.azure.storage.blob.implementation.models.BlobStorageExceptionInternal?

nkoudelia avatar Jan 28 '25 12:01 nkoudelia

it should be solvable similarly to how it's done in .NET: https://github.com/Azure/azure-sdk-for-net/pull/33639

  • storage code provides an error classifier on per-operation basis
  • core has access to it and uses it when reporting logs (it affects severity), traces and metrics - if status code is not classified as error, span status remains unset
  • we should be able to codegen it to some extent.

Related to https://github.com/Azure/azure-sdk-for-java/issues/25157

lmolkova avatar Jul 28 '25 22:07 lmolkova