opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

Resource Detectors produce blank Exception() after 5 seconds

Open jeremydvoss opened this issue 1 year ago • 4 comments

Describe your environment Resource Detector only get 5 seconds by default to run. However, instead of ending the process after 5 seconds, a blank "Exception()" is creating resulting in the follow warning: Exception in detector <opentelemetry.resource.detector.<RESOURCE DETECTOR CLASS> object at 0x000002B955D43D60>, ignoring Note that the Exception string is blank resulting in 2 whitespaces: Exception__in... This blank exception is causing confusion among Azure customers. And perhaps more importantly, the process still hangs.

  Steps to reproduce

  1. Create a resource detector that sleeps for more than 5 seconds (try 10-20):
from opentelemetry.sdk.resources import Resource, ResourceDetector

from time import sleep

class SleepingResourceDetector(ResourceDetector):
    # pylint: disable=no-self-use
    def detect(self) -> "Resource":
        sleep(20)
        attributes = {}
        return Resource(attributes)
  1. Set resource detector entry point in pyproject.toml
[project.entry-points.opentelemetry_resource_detector]
sleeping = "opentelemetry.resource.detector.sleeping:SleepingResourceDetector"
  1. Install package to install entry point
  2. Point sdk to resource detector: export OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=sleeping
  3. Create a Resource: Resource.create() This will eventually call get_aggregated_resources

What is the expected behavior? The "concurrent future" setup should gracefully exit the process and not print out a confusing blank warning.

What is the actual behavior? Process hangs and confusing warning message with blank error.

Additional context Add any other context about the problem here.

jeremydvoss avatar Jan 22 '24 21:01 jeremydvoss

Reopening since #3645 did not fully fix this issue.

ocelotl avatar Jan 25 '24 17:01 ocelotl

This seems similar to https://github.com/open-telemetry/opentelemetry-python/issues/3309 when processors or exporters block, we have no way to cancel them.

aabmass avatar Jan 25 '24 17:01 aabmass

Some options for adding a timeout inside resource detectors:

  • Add subtype like ResourceDetectorWithTimeout
  • Unified Resource Detector Timeout env var
  • Add method to base class getTimeout() that can be but doesn't need to be used by detectors

jeremydvoss avatar Jan 25 '24 17:01 jeremydvoss

A couple more issues:

  • While processes run in parallel, timeout applies sequentially. Detectors waited for last can take the longest
  • We wait for all to finish before allowing app to continue. Even if we let the process finish, we should discard its result and let the app continue in parallel. @aabmass mentioned this may be possible by changing how the with block works. Will look through the docs.

jeremydvoss avatar Jan 25 '24 17:01 jeremydvoss