'None' values cause exporting to fail
Describe your environment
opentelemetry-sdk 1.22.0
opentelemetry-exporter-otlp-proto-grpc 1.22.0
Python 3.10
Steps to reproduce
- Start a Collector as per the steps outlined here: https://opentelemetry-python.readthedocs.io/en/latest/examples/logs/README.html
- Run the following Python program
import logging
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
OTLPLogExporter,
)
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
logger_provider = LoggerProvider()
set_logger_provider(logger_provider)
exporter = OTLPLogExporter(insecure=True)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)
logging.getLogger().addHandler(handler)
# Log works, but is never sent to Collector due to batch-processing failing
logging.warning({"hello": "world"})
# Broken, crashes batch processing
logging.warning({"hello": None})
# Log works, but is never sent to Collector due to batch-processing failing
logging.warning({"hello": "other world"})
handler.flush()
# Log works since previous batch was flushed already
logging.warning({"hello": "new world"})
logger_provider.shutdown()
- Observe the following output in the Collector:
LogRecord #0
ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-02-07 13:57:39.384327168 +0000 UTC
SeverityText: ERROR
SeverityNumber: Error(17)
Body: Str(Exception while exporting logs.)
Attributes:
-> exception.type: Str(Exception)
-> exception.message: Str(Invalid type <class 'NoneType'> of value None)
-> exception.stacktrace: Str(Traceback (most recent call last):
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\sdk\_logs\_internal\export\__init__.py", line 312, in _export_batch
self._exporter.export(self._log_records[:idx]) # type: ignore
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\grpc\_log_exporter\__init__.py", line 108, in export
return self._export(batch)
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\grpc\exporter.py", line 263, in _export
request=self._translate_data(data),
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\grpc\_log_exporter\__init__.py", line 105, in _translate_data
return encode_logs(data)
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\common\_internal\_log_encoder\__init__.py", line 38, in encode_logs
return ExportLogsServiceRequest(resource_logs=_encode_resource_logs(batch))
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\common\_internal\_log_encoder\__init__.py", line 61, in _encode_resource_logs
pb2_log = _encode_log(sdk_log)
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\common\_internal\_log_encoder\__init__.py", line 47, in _encode_log
body=_encode_value(log_data.log_record.body),
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\common\_internal\__init__.py", line 78, in _encode_value
values=[_encode_key_value(str(k), v) for k, v in value.items()]
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\common\_internal\__init__.py", line 78, in <listcomp>
values=[_encode_key_value(str(k), v) for k, v in value.items()]
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\common\_internal\__init__.py", line 85, in _encode_key_value
return PB2KeyValue(key=key, value=_encode_value(value))
File "C:\Users\jvm\.virtualenvs\pythonProject\lib\site-packages\opentelemetry\exporter\otlp\proto\common\_internal\__init__.py", line 81, in _encode_value
raise Exception(f"Invalid type {type(value)} of value {value}")
Exception: Invalid type <class 'NoneType'> of value None
)
Trace ID:
Span ID:
Flags: 0
LogRecord #1
ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-02-07 13:57:39.385586944 +0000 UTC
SeverityText: WARNING
SeverityNumber: Warn(13)
Body: Map({"hello":"new world"})
Trace ID:
Span ID:
Flags: 0
What is the expected behavior?
I expected to see all my logs in the Collector, including a log for the {"hello": None} object that fails to be encoded at the moment
What is the actual behavior? As indicated by step 3 above, the exporter fails to encode the logs, specifically the None value. As we are performing batch processing, this fails the entire batch, causing the rest of the logs to never arrive at the Collector
Additional context
There's probably two issues at hand here:
- Why do None values cause encoding to fail? Is this according to spec or simply an error?
- Is it possible to have the rest of the batch succeed even when one element fails encoding to ensure as many logs as possible survive?
- I guess they'll be allowed only since the next specification release see https://github.com/open-telemetry/opentelemetry-specification/commit/a0aee78e704d30e3cb2ce34823c347e521741e94 . Said that if a string is expected maybe it would be better to cast it to its representation?
BTW per python doc the first argument should be a string https://docs.python.org/3/library/logging.html#logging.Logger.debug
- I'm no expert but when using a Batch log record processor having the whole batch lost because of an error is reasonable
Why do None values cause encoding to fail? Is this according to spec or simply an error?
Attribute values of null are not valid and attempting to set a null value is undefined behavior. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute
https://github.com/open-telemetry/opentelemetry-proto/blob/9d139c87b52669a3e2825b835dd828b57a455a55/opentelemetry/proto/common/v1/common.proto#L25
Cheers, thanks for the answers :)
Hey, it looks like the upstream specification allows for null values as of version 1.31.0 now. Is this something that might come to the library at some stage do ye think? Thanks in advance.
I'd be in favor of taking another look at this. As mentioned above, it seems that there are two issues -- one is what does it mean for a log to be out of spec, and the other is what should be done with a batch of logs if just one of them is out of spec (I think it would be nice to keep the good ones).