apm-server icon indicating copy to clipboard operation
apm-server copied to clipboard

APM Data plugin asset installation feedback loop

Open lucabelluccini opened this issue 1 year ago • 6 comments

After 8.15.0, the APM assets are installed by the apm-data plugin in ES. In case an error occurs, we've observed:

  • Elasticsearch logs errors (good) as below
  • It seems it retries continuously to install the assets (good, but maybe it should retry periodically?)
[instance-0000000054] error adding index template [traces-apm@template] for [apm] java.lang.IllegalArgumentException: composable template [traces-apm@template] template after composition with component templates [traces@mappings, apm@mappings, apm@settings, apm-10d@lifecycle, traces-apm@mappings, traces-apm-fallback@lifecycle, traces@custom, traces-apm@custom, ecs@mappings] is invalid at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateIndexTemplateV2(MetadataIndexTemplateService.java:753) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.addIndexTemplateV2(MetadataIndexTemplateService.java:640) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.addIndexTemplateV2(MetadataIndexTemplateService.java:587) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$5.execute(MetadataIndexTemplateService.java:532) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$1.executeTask(MetadataIndexTemplateService.java:149) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$1.executeTask(MetadataIndexTemplateService.java:146) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.SimpleBatchedExecutor.execute(SimpleBatchedExecutor.java:70) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1070) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1033) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:233) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1686) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.action.ActionListener.run(ActionListener.java:444) ~[elasticsearch-8.15.5.jar:?] at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1683) ~[elasti

In this particular case, it was due to the presence of a conflicting mapping between traces-apm@custom (by the user) and traces-apm@mappings (our, which changed between 8.12 and 8.15)

But the user has no feedback except if they are looking at the logs... We should find a way to "expose this" to Kibana.

Other comments:

  • Each APM data stream is handled singularly, so some assets installations might succeed, others might fail (e.g. traces-apm@template might be successful, but traces-apm.rum@template might fail)
  • No way for users to "test" the setup prior installing the assets

Failure to install assets can generate data loss, so it's not something which can be left unaddressed.

FYI @simitt (not urgent but raised to keep track of it)

lucabelluccini avatar Dec 04 '24 13:12 lucabelluccini

@mlunadia for awareness - we would need to collaborate with the UI team for bringing feedback to the customers. Happy to provide more context where needed, but would appreciate your input here.

simitt avatar Dec 05 '24 10:12 simitt

@simitt which UI team specifically? can we raise an issue in their repo describing the solution we are suggesting to surface these errors?

mlunadia avatar Dec 05 '24 14:12 mlunadia

@mlunadia I don't think we have a mapped out solution, hence also not 100% clear which UI team. I am not aware that the Fleet managed apm setup offered a UI for seeing mapping conflicts (which could also happen then, this is not a problem introduced with the switch to apm ES plugin).

simitt avatar Dec 06 '24 07:12 simitt

Apart from highlighting mapping conflicts, the key point is APM plugin in ES is unable to "finalize" its initialization, making probably data broken, lost or inconsistent.

Identifying this problem and exposing it in the UI and/or monitoring would make it easier for everyone.

Instead, we're relying on the fact the customer would go to the logs of Elasticsearch and find the errors mentioned in the initial issue and knows what might have triggered it.

We do not even document traces-apm@template (or the other index templates of APM) publicly, which basically makes users rely on Discuss forum or Support in order to solve the problem.

At the time of writing, our global logging matches ~20 deployments with errors "for [apm] java.lang.IllegalArgumentException:"

lucabelluccini avatar Mar 31 '25 15:03 lucabelluccini

@akhileshpok for awareness and input on priority.

Apart from highlighting mapping conflicts, the key point is APM plugin in ES is unable to "finalize" its initialization, making probably data broken, lost or inconsistent.

I wonder if this should be discussed with the ES team, as to how ES issues could be better highlighted in Kibana?

simitt avatar Mar 31 '25 15:03 simitt

+1 another user hit this

lucabelluccini avatar Nov 20 '25 14:11 lucabelluccini