Clarify logs processors error handling expectations
Describe your environment
OS: (e.g, MacOS) Python version: (e.g., Python 3.9.6) SDK version: (e.g., 1.33.1) API version: (e.g., 1.33.1)
What happened?
Unhandled exception occurs when LogExporter.export() is called and LogRecordProcessor did not guard against exceptions, which causes exception to bubble up and crash program. It should be LogExporter job to guard against that not LogRecordProcessor.
Steps to Reproduce
Custom processor:
from opentelemetry.sdk._logs import LogRecordProcessor
class RawLogProcessor(LogRecordProcessor):
def __init__(self, exporter):
self.exporter = exporter
def emit(self, log_data):
self.exporter.export([log_data])
def shutdown(self):
if hasattr(self.exporter, 'shutdown'):
self.exporter.shutdown()
def force_flush(self, timeout_millis=30000):
if hasattr(self.exporter, 'force_flush'):
return self.exporter.force_flush(timeout_millis)
return True
Reproduction program:
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from raw_log_processor import RawLogProcessor
import logging
otlp_exporter = OTLPLogExporter(endpoint="http://10.255.255.1:4318/v1/logs", timeout=1)
logger_provider = LoggerProvider()
logger_provider.add_log_record_processor(RawLogProcessor(otlp_exporter))
set_logger_provider(logger_provider)
logger = logging.getLogger(__name__)
logger.addHandler(LoggingHandler(logger_provider=logger_provider))
logger.setLevel(logging.INFO)
logger.propagate = False
logger.info("test")
print("If you see this, the logging worked!")
Expected Result
According to OTEL principles https://opentelemetry.io/docs/specs/otel/error-handling/#basic-error-handling-principles, SDK should not throw unhandled exceptions. For example, if ConnectTimeout occurs during OTLPLogExporter._export() method, LogExporter should handle it and not throw.
Actual Result
SDK does not handle exception which bubble up and cause program to crash.
Exception: requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='10.255.255.1', port=4318): Max retries exceeded with url: /v1/logs (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x104316a30>, 'Connection to 10.255.255.1 timed out. (connect timeout=1)'))
Additional context
The fix would be to handle exceptions in each LogExporter export method implementation.
Would you like to implement a fix?
Yes
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
I wasn't able to reproduce with your snippet. The exceptions you are saying are like this?
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9999): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x100fb4190>: Failed to establish a new connection: [Errno 61] Connection refused'))
I wasn't able to reproduce with your snippet. The exceptions you are saying are like this?
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9999): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x100fb4190>: Failed to establish a new connection: [Errno 61] Connection refused'))
this is the error i am seeing
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='10.255.255.1', port=4318): Max retries exceeded with url: /v1/logs (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x1061c9280>, 'Connection to 10.255.255.1 timed out. (connect timeout=1)'))
~could you try using this branch https://github.com/open-telemetry/opentelemetry-python-contrib/pull/3565 to reproduce error? could it be that is happening because export is not wrapped inside try block like it is case for SimpleLogRecordProcessor https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/export/init.py#L120?~
@emdneto i've simplified snipper for reproduction. could you try again please?
The way the SDK is implemented today, the error handling should happen in processor implementations. For example https://github.com/open-telemetry/opentelemetry-python/blob/b06cf803605dd323ad134a9a434a204bf7fb150f/opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/export/init.py#L120-L123
For the batch processors, everything is already happening in a background thread so exceptions can never bubble up to the caller.
It should be
LogExporterjob to guard against that notLogRecordProcessor.
I'm trying to understand the ask here.
- Do you expect the OTLP exporter's
export()should never raise an exception? - Or that the error handling should be more robustly done in the SDK Logger and Tracer implementations?
- Or just that we improve the documentation for the python SDK?
Option 2 sounds like a compelling change to make, so that it is easier for users to implement their own processors.
hey @aabmass, thanks for a response!
my understanding reading the docs was that LogExporter's export() should not raise exception because it can propagate further - in this case to LogRecordProcessor. however, since that seems to be the pattern for python sdk i don't mind.
my use case was that i implemented processor but didn't handle possible LogExporter exceptions which caused program to crash.
i'd close this issue, but i agree, option 2 sounds like a good change.
Thanks! I think we can improve the docs for implementors.
Let's keep the issue open until we improve the docs then.