apm-agent-python
apm-agent-python copied to clipboard
stacktrace frame vars reported as NaN
Describe the bug:
As part of the ops KPI review we've been investigating the following errors in ecs logs:
decode error: data read error: v2.errorRoot.Error: v2.errorEvent.Exception: v2.errorException.Stacktrace: []v2.stacktraceFrame: v2.stacktraceFrame.Vars: Read: unexpected value type: 0, error found in #10 byte of ...|length": [NaN, NaN, |..., bigger context ...| redacted ...", "common_length": [NaN, NaN, NaN, NaN, NaN, NaN, 27.0, 76.0, 76.0, 76|...
vars is defined as a flat mapping of local variables but NaN is not a valid type.
To Reproduce
I don't have steps to reproduce, this is part of the ops KPI review and was observed in the ecs logs.
Environment (please complete the following information)
The error has been observed in the following agent versions:
- apm-agent-python/6.18.0
- apm-agent-python/6.15.1
- apm-agent-python/6.13.2
- apm-agent-python/6.12.0
- apm-agent-python/6.10.0
- apm-agent-python/6.7.2
- elasticapm-python/5.10.1
- elasticapm-python/5.10.0
This happens on multiple apm-server versions.
Additional context
none
As discussed in slack, we could probably fix this by adding a dep for simplejson; however, the performance implications are unclear. Alternatively, we could do allow_nan=False and drop the JSON when it raises.
We are facing the same issue, and this is blocking the whole ML APM Adoption, any ideas if this fix has been decided to be implemented or any release expectations? Any insights would be extremely helpful .
Will converting NaNs to null fix things for you? We can fix that shortly if that's sufficient.
@xrmx In our case, yes, it will be extremely helpful.
With apm agent 6.22.0 you can change the behaviour by installing simplejson as dependency and the setting the following environment variable:
ELASTIC_APM_TRANSPORT_JSON_SERIALIZER=elasticapm.utils.simplejson_encoder.dumps