apm icon indicating copy to clipboard operation
apm copied to clipboard

Support metrics with dots in their names

Open SylvainJuge opened this issue 4 years ago • 37 comments

Describe the bug

As of version 1.18.0, the java agent added instrumentation of micrometer framework, which allows to capture metrics managed with this framework. While the reported issue is specific to HikariCP connection pool, it might happen with other frameworks, or could be user-provided.

Initial investigation of the issue shows that some metric names are problematic and conflict with metrics storage. This issue has been opened in the apm repository as the actual change required might spread over multiple repositories and the solution to fix it properly isn't defined yet.

This was reported in our public forum here : https://discuss.elastic.co/t/failed-to-transform-sample-model-sample/249962

Currently, you can't have two metrics where one is the prefix of another with . as separator, connections=3 and connections.idle=2 will conflict and at least one of them will be ignored as one expects connections to be a metric value, and the other expects it to be an object with an idle property.

To Reproduce

Follow instructions described in the forum to reproduce it with the Java agent.

Any agent that will sent two metrics with names that conflict with each other will have a similar issue.

Expected behavior

Metric names should not have any impact on our ability to capture them.

Debug logs

Server debug logs (provided in the forum)
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.idle", Value:2, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.creation.count", Value:600, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.creation.sum.us", Value:7.0216e+07, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.timeout", Value:6, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.pending", Value:0, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.creation.sum.us", Value:5.7659e+07, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.active", Value:0, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.idle", Value:2, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.creation.count", Value:511, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.usage.count", Value:128944, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.usage.sum.us", Value:1.983578e+09, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.max", Value:50, Values:[]float64(nil), Counts:[]int64(nil)}
Agent logs with log_level=trace

Relevant part where the issue is:

"hikaricp.connections":{"value":3.0},"hikaricp.connections.timeout":{"value":0.0}

Full log

Sep 28 16:34:03 env[3405743]: {"metricset":{"timestamp":1601310843073000,"tags":{"pool":"HikariPool-2"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":46},"hikaricp.connections.usage.sum.us":{"value":409000.0},"hikaricp.connections":{"value":3.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections.acquire.count":{"value":46},"hikaricp.connections.acquire.sum.us":{"value":159244.109},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.idle":{"value":3.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:34:03 env[3405743]: {"metricset":{"timestamp":1601310843073000,"tags":{"pool":"HikariPool-1"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":50},"hikaricp.connections.usage.sum.us":{"value":889000.0},"hikaricp.connections.acquire.count":{"value":50},"hikaricp.connections.acquire.sum.us":{"value":217912.23},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections":{"value":3.0},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.idle":{"value":3.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:34:33 env[3405743]: {"metricset":{"timestamp":1601310873073000,"tags":{"pool":"HikariPool-2"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":60},"hikaricp.connections.usage.sum.us":{"value":416000.0},"hikaricp.connections":{"value":3.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections.acquire.count":{"value":60},"hikaricp.connections.acquire.sum.us":{"value":172080.99},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.idle":{"value":3.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:34:33 env[3405743]: {"metricset":{"timestamp":1601310873073000,"tags":{"pool":"HikariPool-1"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":64},"hikaricp.connections.usage.sum.us":{"value":896000.0},"hikaricp.connections.acquire.count":{"value":64},"hikaricp.connections.acquire.sum.us":{"value":243931.7},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections":{"value":3.0},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.idle":{"value":3.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:35:03 env[3405743]: {"metricset":{"timestamp":1601310903073000,"tags":{"pool":"HikariPool-2"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":74},"hikaricp.connections.usage.sum.us":{"value":423000.0},"hikaricp.connections":{"value":2.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections.acquire.count":{"value":74},"hikaricp.connections.acquire.sum.us":{"value":184358.326},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.idle":{"value":2.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:35:03 env[3405743]: {"metricset":{"timestamp":1601310903073000,"tags":{"pool":"HikariPool-1"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":78},"hikaricp.connections.usage.sum.us":{"value":903000.0},"hikaricp.connections.acquire.count":{"value":78},"hikaricp.connections.acquire.sum.us":{"value":269137.9},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections":{"value":2.0},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.idle":{"value":2.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}

Possible approaches to solve this

While we can apply workarounds that are specific to HikariCP or even to our instrumentation of Micrometer, it seems that a proper fix for this needs to be applied at either agent or apm-server level.

Here is a non-definitive list of ideas that we (java agent team) have come so far:

  1. Agent could simply rename metrics on the fly, for example replace . with _. While simple to change, that would make such metrics different from others that use dots and break compatibility with existing metrics and even impact APM UI as current visualizations rely on metric names. Another downside is with custom metrics, those will have different (but close) names which could be confusing.
  2. Agent (or server) could post-process any metricset to ensure that no metrics conflict by adding an extra _value field automatically when required. This would work as long as those metrics are sent together in the same metricset, which is quite likely. In the example above all metrics related to a database connection pool will always reported together.
  3. Add a configuration option at agent-level that will allow to rename those conflicting metrics. This would assume that the frequency of such cases is small enough to not be a constant burden. Given that hikaricp is the default connection pool for spring-boot applications, which itself is a very popular framework, that's not a very appealing option for Java agent.

Currently, we think that the option 2. is the best for the long term, and having a few metrics with an extra _value attribute should be straight-forward for the end-user.

What are the other options that we could implement for this ?


Current workaround

  • with agent version 1.18.0, add micrometer to disable_instrumentations, this will completely disable micrometer integration.
  • use latest snapshot version or 1.19.0 ( not released as time of writing) and set disabled_metrics=hikaricp.connections

SylvainJuge avatar Sep 29 '20 13:09 SylvainJuge

The server team discussed this just now, and we feel like option 2 is too fragile, since it only works if the metrics are all part of the same metric set. If for example an agent were to sometimes omit (delta) metrics that did not change in the most recent interval, then that could mean there is no conflict and the name would flip. That would then lead to an indexing conflict.

In the shorter term, could the Java agent rewrite the specific problematic hikaricp metric (e.g. hikaricp.connections -> hikaricp.connections.total)? It wouldn't necessarily need to be configurable. This obviously isn't perfect, but it would resolve the immediate issue.

The server team will also look into our options going forward, and raise enhancement requests with the Elasticsearch team.

axw avatar Sep 30 '20 12:09 axw

Agent could simply rename metrics on the fly, for example replace . with _. While simple to change, that would make such metrics different from others that use dots and break compatibility with existing metrics

I learned yesterday of https://github.com/elastic/kibana/issues/69908. This relies on index patterns and therefore isn't immediately useful for dynamic fields, but it may also be possible to introduce a _meta field which sets a display name for a field.

I'm not sure if/how we could dynamically set that though; this doesn't fit with the current proposal for dynamic mapping type hints: https://github.com/elastic/elasticsearch/issues/61939

and even impact APM UI as current visualizations rely on metric names. Another downside is with custom metrics, those will have different (but close) names which could be confusing.

I assume you mean that we would rename things like process, runtime, etc. metrics? We could treat these well-defined metrics differently. In a way they are already treated differently; they have explicit field mappings.

axw avatar Oct 02 '20 02:10 axw

While option 2 is less fragile when done in the agent and not on the server, it's also not perfect. When adding a custom metric foo during the startup of the application and foo.bar at a later point in time, we'll also end up with mapping conflicts.

To fix that, users have to delete the metric index and rename the foo metric.

The only option I see to fix that in a systematic way is to disallow/replace dots for custom metrics. Interestingly, in Prometheus, dots in metric names are disallowed: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

It [the metric name] may contain ASCII letters and digits, as well as underscores and colons. It must match the regex [a-zA-Z_:][a-zA-Z0-9_:]*.

felixbarny avatar Oct 12 '20 09:10 felixbarny

I agree that's the best option for now. There's a new Elasticsearch enhancement request to support dotted field names at: https://github.com/elastic/elasticsearch/issues/63530

axw avatar Oct 13 '20 01:10 axw

Since the underlying issue is with Elasticsearch, it might take a while before we can rely on a proper fix for that. Thus in the mean time, we still need to add a work-around.

The current workaround requires to use disable_metrics: :+1: quite simple to configure once you know what value to set :+1: the original metric names expected by user are kept :-1: we remove the conflicting metric, but loose it's value :-1: the mapping conflict is still triggered server side and will require to delete existing metrics

Because option 2 (add suffix automatically) is not reliable, we won't explore it further.

Thus, we are left with three options:

  • rename metrics that contain dots in their names :question: do we apply this to 1) all metrics, 2) only custom metrics set by user (includes JMX) 3) micrometer metrics :question: what about existing metrics and impacts on UI if renaming existing ones :-1: those metrics will become inconsistent with other metrics and with names that do not match what user expects. :-1: would trigger a breaking change if we want to revert this later, only users that built things on those metrics would be impacted though.
  • add a configuration option (maybe internal) to add suffix to a list of known metrics. :+1: slightly better than current workaround as it allows to keep metric value :-1: same for the mapping conflict
  • wait until we have better options

Do we agree on:

  • doing the "rename option" is the best thing to do at agent level right now
  • apply this to all metrics that are under user control (Micrometer, JMX), and leave existing metrics that are used for JVM UI in Kibana as-is.

SylvainJuge avatar Oct 13 '20 14:10 SylvainJuge

Since the optimal solution would be through the implementation of https://github.com/elastic/elasticsearch/issues/63530, and we will eventually want to rely on it, maybe trying to get information on when this is expected can help. I'd say if we think it is not too far ahead, we can do now with disabling metrics. Renaming metrics is easy and robust but it means breaking now (for users who rely on such metrics) and re-breaking when we get back to report dotted names.

eyalkoren avatar Oct 14 '20 07:10 eyalkoren

Another occurrence at https://discuss.elastic.co/t/hikaricp-metrics-ingestion-error/259875

lucabelluccini avatar Feb 02 '21 09:02 lucabelluccini

Also, the latest update on the underlying issue refers to flattened fields, which is currently WIP, thus we can expect to have good news on the topic soon https://github.com/elastic/elasticsearch/issues/63530#issuecomment-771842245 :crossed_fingers: !

SylvainJuge avatar Feb 02 '21 18:02 SylvainJuge

Did this get fixed and released at some point? I see some merged PRs.

OrangeDog avatar Mar 22 '22 12:03 OrangeDog

@OrangeDog at least from the latest updates, we have an option for the APM Java Agent which will, by default, dedots the metrics. E.g. connections.idle becomes connections_idle. The dedotting:

  • will avoid problems unless the metric is using a different data type for the same metric (which is not common)
  • allows to ingest metrics such as connections=100, connections.idle=89 as they'll be indexed as connections and connections_idle
  • cannot prevent some problems related to float/integers (see https://github.com/elastic/apm-server/issues/5682)
  • might lead to too many fields (if you have hundreds of metrics)

The solution would be to rely on Elasticsearch feature request https://github.com/elastic/elasticsearch/issues/63530#issuecomment-771842245, which is not yet ready.

lucabelluccini avatar Mar 22 '22 13:03 lucabelluccini

Hello, I am reaching out to let you know that dedotting will no longer be needed from Elasticsearch 8.3 onwards. See https://github.com/elastic/elasticsearch/pull/86166 . Any object can be configured in the mappings to not hold any further subobjects in which case dots in field names will be left as-is so that connections and connections.idle can both hold values. Note that this is a different approach compared to the comment that was linked above around using flattened field for this usecase. We are rather providiing a solution that allows your solution to replace dedotting and leave dots in field names untouched.

javanna avatar May 17 '22 15:05 javanna

@axw is that all we need in order to support dots in names? I'm not so sure.

In contrast to the examples in https://github.com/elastic/elasticsearch/pull/86166, we don't have a metrics object under which all metrics are nested. Should we set subobjects: false to the first object in a metric then? For example, for the metric foo.bar, set the mapping of the foo object to subobjects: false. However, that can still conflict if there's another metric called foo.

@javanna for TSDB, is there a plan to automatically map metric fields (those that are not dimensions) in a way so that they are never objects? Something like subobjects: false on the root level, but only for metric fields.

felixbarny avatar May 18 '22 07:05 felixbarny

is that all we need in order to support dots in names?

@felixbarny I'm also not sure. If we have to add a metrics. prefix to all metrics, then that would require breaking changes because users may have alerts on their metrics, for example.

In https://github.com/elastic/elasticsearch/issues/63530#issuecomment-945471746 it was suggested that we would be able to set flattened: true on the root object.

Something like subobjects: false on the root level, but only for metric fields.

This is what I was originally hoping for, and what I was describing in https://github.com/elastic/elasticsearch/issues/63530#issuecomment-946324108.

@romseygeek made the point that it doesn't really matter whether they're flattened or not, unless we specifically care about the _source structure (which I don't think we do). So, if we can set subobjects: false at the root level, then I think we're good?

axw avatar May 18 '22 07:05 axw

Right, I think we'd be fine even with metadata fields, such as host.name not being mapped as an object. Being able to set subobjects: false at the root level sounds like a good idea. @javanna would that be feasible?

felixbarny avatar May 18 '22 07:05 felixbarny

Heya, happy that I have gotten your attention, there was some confusion in the discussion in the ticket mentioned above. Can I get an example document where you need to use this feature so I can better understand what the way forward is and make recommendations?

The idea would be to find a way to not have to make breaking changes or change your documents. The solution we came up with is hopefully flexible enough to give the option to either have the whole document without subobjects, or only a specific section of it.

javanna avatar May 18 '22 07:05 javanna

An example document:

{
  "host": {
    "name": "myhost"
  },
  "foo": 42,
  "foo.bar": 42,
  "baz": 21,
  "baz.qux": 21
}

felixbarny avatar May 18 '22 07:05 felixbarny

@javanna important to note in the above:

  • host.name would be a field defined statically, ahead of time (this is an ECS field, we define it in a component template)
  • foo, foo.bar, baz, and baz.qux would be dynamically mapped metrics

axw avatar May 18 '22 07:05 axw

Ok with this document there is no doubt that you'll have to set subobjects: false to the root for foo and baz to both hold a value as well as be the prefix for foo.bar and baz.qux.

Though there is a small issue with the current implementation: host can't be specified as an object in the document if the root object is configured to not hold objects. Is it feasible to consistently provide fields with dots in their names like host.nameand have no inner objects? Or does host need to be an object in your documents?

javanna avatar May 18 '22 07:05 javanna

Ok with this document there is no doubt that you'll have to set subobjects: false to the root

Is that already possible?

host can't be specified as an object in the document if the root object is configured to not hold objects. Is it feasible to consistently provide fields with dots in their names like host.nameand have no inner objects? Or does host need to be an object in your documents?

Is it an issue if the source document contains nested objects? Could ES convert that into dotted fields during indexing? If not, I suppose APM Server could modify the fields to convert nesting into dotting. It doesn't sound great but probably feasible.

But whatever we do, we should make sure to closely align with TSDB. Not sure if there's a catch when mapping metric fields and/or dimension fields as subobjects: false.

felixbarny avatar May 18 '22 07:05 felixbarny

Ok with this document there is no doubt that you'll have to set subobjects: false to the root for foo and baz to both hold a value as well as be the prefix for foo.bar and baz.qux.

:+1:

Though there is a small issue with the current implementation: host can't be specified as an object in the document if the root object is configured to not hold objects. Is it feasible to consistently provide fields with dots in their names like host.name and have no inner objects? Or does host need to be an object in your documents?

AFAIK we don't have any specific need for it to be an object, but this is not entirely within our control; Fleet creates the index & component templates. I'll get in touch with the Fleet folks to see if we can flatten the mappings.

It seems like more changes are needed in Elasticsearch though, is that right? This fails:

PUT /foo
{
  "mappings" : {
    "subobjects": false,
    "properties": {
      "host.name": {
        "type": "keyword"
      }
    }
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Failed to parse mapping: Object [_doc] has subobjects set to false hence it does not support inner object [host]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping: Object [_doc] has subobjects set to false hence it does not support inner object [host]",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Object [_doc] has subobjects set to false hence it does not support inner object [host]"
    }
  },
  "status": 400
}

axw avatar May 18 '22 07:05 axw

Using non-object mappings for ECS fields, such as host.name doesn't sound so great.

There's an effort around making ECS mappings more native in Elasticsearch: https://github.com/elastic/elasticsearch/issues/85692

We'd need to consider two versions of the mapping then - an object-based and a flat one.

felixbarny avatar May 18 '22 08:05 felixbarny

Ok @felixbarny let me think about that, we can try and handle that in ES.

@axw for the error above, I am looking into it.

javanna avatar May 18 '22 08:05 javanna

@axw The error you got does not look like it's returned from the above create index call, but rather from indexing a doc with a host object, or trying to map host as an object, which is the very same problem I mentioned above and I am discussion with Felix too. Can you confirm I got it right?

javanna avatar May 18 '22 08:05 javanna

@javanna nope, I ran that exact PUT /foo request in dev tools -- just tried it again. I'm running the latest 8.3.0-SNAPSHOT, 0418e8a9d824b33c02d4d84ee2ffff91287c9472.

axw avatar May 18 '22 09:05 axw

@axw I found that problem and opened a PR (https://github.com/elastic/elasticsearch/pull/86894), thanks for spotting it.

@felixbarny I wanted to get back to what you said above "Using non-object mappings for ECS fields, such as host.name doesn't sound so great.". We need to get on the same page here: one thing is whether documents contain objects or not, for instance we could relax the current restriction and allow you to still provide host: {name: ""} in an object shape in your documents. This requires extra work on our end but it's doable. Another thing is allowing for mappings to hold objects when subobjects are disabled: this is not possible. If your root object is configured to not hold subobjects, host cannot be an object in your mappings, and in that case specifying host.name as a keyword field in your mapping will be treated differently, meaning host will not be made an object automatically. I hope, but we need to verify this, that you don't specify host as an object in your mappings.

I hope this clarifies things, but just in case, I would like to see your mappings and real sample documents. It would be great for us also to base our tests on your real usecase rather than dummy documents.

Please ping me and let's chat on a zoom if things are unclear.

javanna avatar May 18 '22 14:05 javanna

@javanna and I had a chat on zoom.

A container object for metrics (all metrics are nested under a metrics object) would make things simpler and probably cleaner, too. Currently, there is a risk that a metric name conflicts with a metadata field. For example, if a metric is called host or host.name. In practice, it seems unlikely that there will often be conflicts, but it's a possibility. But introducing a container object now would be a breaking change.

Currently, the mapping templates we generate are nested. For example:

{
  "mappings": {
    "properties": {
      "host": {
        "properties": {
          "name": {
            "type": "keyword",
            "ignore_above": 1024
          }
        }
      }
    }
  }
}

If we flatten the mappings like this:

{
  "mappings": {
    "dynamic": "true",
    "properties": {
      "host.name": {
        "type": "keyword",
        "ignore_above": 1024
      }
    }
  }
}

They can work both for the case where subobjects is set to true and false. When set to false, Elasticsearch will automatically create nested objects. As Andrew said, that's something we'd need to change in Fleet but it seems doable at first glance.

Another thing to watch out for when mapping the root with subobjects: false is that even the metadata fields in the metric documents need to be flattened. So either APM Server or Elasticsearch needs to transform

{
  "host": {
    "name": "myhost"
  },
  "foo": 42,
}

Into

{
  "host.name": "myhost",
  "foo": 42,
}

I agreed with @javanna that for now (8.3), Elasticsearch should not automatically do the flattening. If it's not feasible to do it in APM Server, we can talk about adding a flattening feature to Elasticsearch.

Luca also explained to me that the very same limitations for dotted field names apply to TSDB.

felixbarny avatar May 19 '22 10:05 felixbarny

I think the metrics. prefix is the best idea. This will be a breaking change anyway, as any existing metrics would have to have replaced the dots somehow.

OrangeDog avatar May 19 '22 10:05 OrangeDog

I don't mean to interfere here and I think that you folks need to have an internal discussion to align on the matter, but based on my preliminary chat with Felix if felt like we are trying not to make breaking changes, and we should be able to achieve that. Please ping me if you discuss this, I am glad to be involved.

javanna avatar May 19 '22 15:05 javanna

@axw heads up: the bug you hit at your first try is now fixed in master. Thanks for your patience :)

javanna avatar May 23 '22 09:05 javanna

I think the metrics. prefix is the best idea. This will be a breaking change anyway, as any existing metrics would have to have replaced the dots somehow.

@OrangeDog we only replace dots when we don't know the metric names ahead of time, e.g. for custom application metrics.

For builtin metrics (e.g. https://www.elastic.co/guide/en/apm/agent/java/current/metrics.html), we are not de-dotting. So for example, the builtin metric jvm.memory.heap.used will be dynamically mapped as "jvm": {"memory": {"heap": {"used": .... Elasticsearch already allows this to be addressed as jvm.memory.heap.used in search queries, and will return the flattened form in fields in search hits.

I do think having a prefix would be neater and simpler, but we would like to avoid breaking users for the moment.

I agreed with @javanna that for now (8.3), Elasticsearch should not automatically do the flattening. If it's not feasible to do it in APM Server, we can talk about adding a flattening feature to Elasticsearch.

Luca also explained to me that the very same limitations for dotted field names apply to TSDB.

If we need to flatten all metadata fields at ingest time, then we will need to perform some additional testing around the _source storage cost. As there would be a lot of redundancy, it may increase storage size significantly, but perhaps it will compress well enough. If size increases significantly, we might need to tie in with https://github.com/elastic/elasticsearch/issues/86603.

axw avatar May 24 '22 04:05 axw