apm-server Metrics without any explicit mapping can lead to document rejection

Some APM clients support metrics which are unknown by APM Server (e.g. micrometer for the APM Java Agent). APM doesn't know the field names nor their type.

It can happen some of the fields, due to the fact there's no default mapping, might be mapped as float instead of long (or the opposite), leading to document rejections depending on the first document to be indexed after a rollover or index creation.

Some examples:

hikaricp_connections_acquire.count
jvm_memory_max
jvm_buffer_total_capacity
jvm.gc.time

It is possible to observe this behavior especially when Logstash is in the middle as we have a 400 errors in the logs.

The problem is not easy to tackle, except defaulting to float instead of long for integer values (e.g. using a dynamic template)?

Jul 12 '21 11:07 lucabelluccini

@lucabelluccini with the new data streams scheme, service specific metrics are sent to service specific data streams, and only common metrics share a data stream. I believe this issue can be closed now as a sideeffect of the new indexing strategy. @lucabelluccini do you agree?

Mar 15 '22 17:03 simitt

Hello @simitt The Data Streams for sure help, avoiding mapping conflicts if the same metric was used by different services and they used different types.

Still, the problem could occur even when the metric is reported by the same agent/service.

The most common issue can happen for representation of float numbers in Golang JSON serialization. See this golang playground: https://play.golang.com/p/tPYQZFGLajT

Example

If jvm.gc.time is 1.0, it will be serialized as 1 by APM-Server. Elasticsearch will default to long. If the next jvm.gc.time is 0.9, it will be rejected.

Mar 15 '22 17:03 lucabelluccini

I think we should probably map all simple numeric metrics as double, to avoid this issue.

Mar 16 '22 06:03 axw