opentelemetry-python
opentelemetry-python copied to clipboard
Resource Detectors produce blank Exception() after 5 seconds
Describe your environment
Resource Detector only get 5 seconds by default to run. However, instead of ending the process after 5 seconds, a blank "Exception()" is creating resulting in the follow warning:
Exception in detector <opentelemetry.resource.detector.<RESOURCE DETECTOR CLASS> object at 0x000002B955D43D60>, ignoring
Note that the Exception string is blank resulting in 2 whitespaces: Exception__in...
This blank exception is causing confusion among Azure customers. And perhaps more importantly, the process still hangs.
Steps to reproduce
- Create a resource detector that sleeps for more than 5 seconds (try 10-20):
from opentelemetry.sdk.resources import Resource, ResourceDetector
from time import sleep
class SleepingResourceDetector(ResourceDetector):
# pylint: disable=no-self-use
def detect(self) -> "Resource":
sleep(20)
attributes = {}
return Resource(attributes)
- Set resource detector entry point in pyproject.toml
[project.entry-points.opentelemetry_resource_detector]
sleeping = "opentelemetry.resource.detector.sleeping:SleepingResourceDetector"
- Install package to install entry point
- Point sdk to resource detector:
export OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=sleeping
- Create a Resource:
Resource.create()
This will eventually call get_aggregated_resources
What is the expected behavior? The "concurrent future" setup should gracefully exit the process and not print out a confusing blank warning.
What is the actual behavior? Process hangs and confusing warning message with blank error.
Additional context Add any other context about the problem here.
Reopening since #3645 did not fully fix this issue.
This seems similar to https://github.com/open-telemetry/opentelemetry-python/issues/3309 when processors or exporters block, we have no way to cancel them.
Some options for adding a timeout inside resource detectors:
- Add subtype like ResourceDetectorWithTimeout
- Unified Resource Detector Timeout env var
- Add method to base class getTimeout() that can be but doesn't need to be used by detectors
A couple more issues:
- While processes run in parallel, timeout applies sequentially. Detectors waited for last can take the longest
- We wait for all to finish before allowing app to continue. Even if we let the process finish, we should discard its result and let the app continue in parallel. @aabmass mentioned this may be possible by changing how the with block works. Will look through the docs.