azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

OpenTelemetry Container App AzureVMResourceDetector Exception

Open Freshchris01 opened this issue 2 years ago • 12 comments

  • azure-monitor-opentelemetry:
  • 1.1.0:
  • Linux:
  • 3.9:

Describe the bug Auto instrumentation of FastAPI Python in Azure Container App results in a error: WARNING:Exception in detector <opentelemetry.resource.detector.azure.vm.AzureVMResourceDetector object at 0x7f011e043ac0>, ignoring The error pointing to the AzureVM detector seems random to me.

To Reproduce Steps to reproduce the behavior:

  1. Create Container App Environment, configure logging to Azure Monitor and the correct log workspace you want to use for application insights as well
  2. Create container App with an image with the code in the appendix
  3. Set Environment variable APPLICATIONINSIGHTS_CONNECTION_STRING to a valid app insights instance
  4. The replica will not activate and log : WARNING:Exception in detector <opentelemetry.resource.detector.azure.vm.AzureVMResourceDetector object at 0x7f011e043ac0>, ignoring

main.py:

from typing import Union
import logging
import os

from opentelemetry.trace import (
    SpanKind,
    get_tracer_provider,
    set_tracer_provider,
)
from opentelemetry.propagate import extract

from fastapi import FastAPI, HTTPException

from azure.monitor.opentelemetry import configure_azure_monitor

# Import the tracing api from the `opentelemetry` package.
from opentelemetry import trace

logging.basicConfig(format = "%(asctime)s:%(levelname)s:%(message)s", level = logging.WARNING)
logger = logging.getLogger(__name__)

# Configure OpenTelemetry to use Azure Monitor with the specified connection
# string.
configure_azure_monitor(
    #connection_string = os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"],
    connection_string = os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
    instrumentation_options = {"azure_sdk": {"enabled": False}}
    disable_offline_storage=True
)

tracer = trace.get_tracer(__name__,
                          tracer_provider=get_tracer_provider())

# Start a new span with the name "hello". This also sets this created span as the current span in this context. This span will be exported to Azure Monitor as part of the trace.
with tracer.start_as_current_span("hello"):
    print("Hello, World!")


app = FastAPI()


@app.get("/")
def read_root():
    return {"Hello": "World"}

@app.get("/testroute")
def read_root():
    with tracer.start_as_current_span(
        "error_test_request",
        #context=extract(request.headers),
        kind=SpanKind.SERVER
    ):
        logger.error("Error test endpoint was reached. . .")
        raise HTTPException(status_code=500, detail="Inernal Error")
        return {"message": "Error Test!"}


@app.get("/items/{item_id}")
def read_item(item_id: int, q: Union[str, None] = None):
    return {"item_id": item_id, "q": q}

Dockerfile:

FROM python:3.9
WORKDIR /
COPY ./requirements.txt /requirements.txt
RUN pip install --no-cache-dir --upgrade -r /requirements.txt
COPY ./ /
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]

requirements.txt:

annotated-types==0.6.0
anyio==3.7.1
asgiref==3.7.2
azure-core==1.29.5
azure-core-tracing-opentelemetry==1.0.0b11
azure-monitor-opentelemetry==1.1.0
azure-monitor-opentelemetry-exporter==1.0.0b19
certifi==2023.11.17
charset-normalizer==3.3.2
click==8.1.7
Deprecated==1.2.14
exceptiongroup==1.2.0
fastapi==0.104.1
fixedint==0.1.6
h11==0.14.0
httptools==0.6.1
idna==3.4
importlib-metadata==6.8.0
isodate==0.6.1
msrest==0.7.1
oauthlib==3.2.2
opentelemetry-api==1.21.0
opentelemetry-instrumentation==0.42b0
opentelemetry-instrumentation-asgi==0.42b0
opentelemetry-instrumentation-dbapi==0.42b0
opentelemetry-instrumentation-django==0.42b0
opentelemetry-instrumentation-fastapi==0.42b0
opentelemetry-instrumentation-flask==0.42b0
opentelemetry-instrumentation-psycopg2==0.42b0
opentelemetry-instrumentation-requests==0.42b0
opentelemetry-instrumentation-urllib==0.42b0
opentelemetry-instrumentation-urllib3==0.42b0
opentelemetry-instrumentation-wsgi==0.42b0
opentelemetry-resource-detector-azure==0.1.0
opentelemetry-sdk==1.21.0
opentelemetry-semantic-conventions==0.42b0
opentelemetry-util-http==0.42b0
packaging==23.2
pydantic==2.5.2
pydantic_core==2.14.5
python-dotenv==1.0.0
PyYAML==6.0.1
requests==2.31.0
requests-oauthlib==1.3.1
six==1.16.0
sniffio==1.3.0
starlette==0.27.0
typing_extensions==4.8.0
urllib3==2.1.0
uvicorn==0.24.0.post1
uvloop==0.19.0
watchfiles==0.21.0
websockets==12.0
wrapt==1.16.0
zipp==3.17.0

Expected behavior The container app sends logs and metrics to Azure Monitor and the replica is healthy.

Screenshots image

Freshchris01 avatar Nov 27 '23 11:11 Freshchris01

When I comment out the following part, the container replica is healthy, I can call the API.

configure_azure_monitor(
    #connection_string = os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"],
    connection_string = os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
    instrumentation_options = {"azure_sdk": {"enabled": False}}
    disable_offline_storage=True
)

The error also appears using azure-monitor-opentelemetry==1.0.0. I also deployed the container to an Web App Service. The test errors and request information was forwarded to application insights as expected there.

Freshchris01 avatar Nov 27 '23 11:11 Freshchris01

Hey, thanks for the feedback. @lzchen. @jeremydvoss, do you know what's causing this issue?

pvaneck avatar Nov 27 '23 19:11 pvaneck

@pvaneck No. The problem here is that for some reason the error is not actually explained. Notice the double-space where the error message should show. It is as if a blank "Exception()" is called somewhere. That being said, OpenTelemetry Resource is designed to call the detectors and ignore it if it fails. So, this does not affect the app running.

jeremydvoss avatar Dec 05 '23 19:12 jeremydvoss

We have release azure-monitor-opentelemetry=1.1.1 . This includes the ability to disable resource detectors. For instance, in order to disable the VM detector, but leave the App Service detector on, the customer can set their environment variable OTEL_EXPERIMENTAL_RESOURCE_DETECTORS="azure_app_service"

Let me know if this solves the problem.

jeremydvoss avatar Dec 05 '23 19:12 jeremydvoss

Thanks @jeremydvoss, for now we deployed version 1.0.0b16. We will update to the new version, I'll give an update here.

Freshchris01 avatar Dec 06 '23 10:12 Freshchris01

Hi @Freshchris01. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] avatar Dec 07 '23 16:12 github-actions[bot]

We have release azure-monitor-opentelemetry=1.1.1 . This includes the ability to disable resource detectors. For instance, in order to disable the VM detector, but leave the App Service detector on, the customer can set their environment variable OTEL_EXPERIMENTAL_RESOURCE_DETECTORS="azure_app_service"

Let me know if this solves the problem.

what to set in azure container apps? OTEL_EXPERIMENTAL_RESOURCE_DETECTORS="azure_app_service"

litan1106 avatar Dec 14 '23 19:12 litan1106

@litan1106

Yes set it to the same value for now. This is just a workaround to block the error message from appearing.

lzchen avatar Dec 14 '23 22:12 lzchen

Yes set it to the same value for now. This is just a workaround to block the error message from appearing.

yes, we are using OTEL_EXPERIMENTAL_RESOURCE_DETECTORS="azure_app_service" locally as well since we are using a local app insights too

litan1106 avatar Dec 14 '23 23:12 litan1106

Hi @Freshchris01, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

github-actions[bot] avatar Dec 22 '23 03:12 github-actions[bot]

Just writing to say I have the same issue. Almost exact same setup as OP. azure-monitor-opentelemetry = "^1.1.1" Fixed it for now by hardcoding the OTEL_EXPERIMENTAL_RESOURCE_DETECTORS env variable, but this is suboptimal.

lukebuehler avatar Jan 08 '24 22:01 lukebuehler

@Freshchris01 Please try with the latest version of opentelemetry-resource-detector-azure==0.1.1 https://pypi.org/project/opentelemetry-resource-detector-azure/0.1.1/

jeremydvoss avatar Jan 10 '24 22:01 jeremydvoss

Hi @Freshchris01, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

github-actions[bot] avatar Jan 18 '24 03:01 github-actions[bot]

The new azure-monitor-opentelemetry==1.2.0 includes a couple improvements to the resource detector. If you are still having issues, upgrade to this version and let us know if your issue persists.

jeremydvoss avatar Jan 22 '24 18:01 jeremydvoss

The new azure-monitor-opentelemetry==1.2.0 includes a couple improvements to the resource detector. If you are still having issues, upgrade to this version and let us know if your issue persists.

Hello, it still exists but the function app does not fail. However, I can't be certain if that is preventing logs to be propagated to the Application insights (in my case).

vipinanandcpp avatar Jan 22 '24 19:01 vipinanandcpp

@vipinanandcpp I'll be a bit clearer: Resource Detector errors such as the AzureVMResourceDetector warning you mentioned are ignored and do not crash the app or prevent telemetry from flowing. There was a bug that could cause the app to hang inside AzureVMResourceDetector preventing it from starting. That should be fixed by the recent release. However, if telemetry is not exporting to Application Insights even after this is fixed, there is likely an unrelated issue with your set up.

jeremydvoss avatar Jan 22 '24 19:01 jeremydvoss

Regarding the confusing warning message itself, I believe I have discovered the source: The concurrent.futures system that the OpenTelemetry SDK is using to run the resource detectors is not working correctly for processes that can take longer than 5 seconds.

I have made an issue in the OTel repo to track this.

Again this is unrelated to whether your telemetry is exporting or not.

jeremydvoss avatar Jan 22 '24 21:01 jeremydvoss

The issue stems from an unclear timeout in the OTel SDK. My fix will be in the next release. In order to not trigger the 5 second timeout, the VM Resource Detector now sets its own timeout to 4s. Please update to opentelemetry-resource-detector-azure=0.1.3

jeremydvoss avatar Jan 25 '24 20:01 jeremydvoss