Move index templates and ingest pipelines from apm package to ES
Add index templates and ingest pipelines, currently defined in the apm-package, to Elasticsearch itself.
See a proof of concept at https://github.com/elastic/elasticsearch/pull/97546. Thus, by default, APM could index into a fresh Elasticsearch cluster without any checks that the templates and pipelines are set up.
After https://github.com/elastic/elasticsearch/pull/97546 is merged, we'll need to follow up by extending the YAML REST tests (see https://github.com/elastic/elasticsearch/blob/d832ac65b3cd072263dc198fd211f7f87787ec9b/x-pack/plugin/apm-data/src/yamlRestTest/resources/rest-api-spec/test/10_apm.yml), and fix/enhance index templates and ingest pipelines accordingly. These tests should index some sample documents and asserting that they're processed and mapped as expected.
With https://github.com/elastic/apm-server/pull/12066, and the changes in https://github.com/elastic/elasticsearch/pull/103032, I'm able to get all APM Server system tests to pass. There will be some differences to approvals: there are additional .text subfields, and we're not removing fields from span documents like we do in the integration's ingest pipeline. I think both of those changes are desirable.
Before we turn the plugin on by default, we'll need to update https://www.elastic.co/guide/en/elasticsearch/reference/master/snapshots-restore-snapshot.html#restore-entire-cluster to include disabling/re-enabling the apm-data registry.
I did some more local testing with https://github.com/elastic/apm-server/pull/12066, and found that TestJaeger fails because labels.location is matching the ecs_geo_point dynamic template in ecs@mappings: https://github.com/elastic/elasticsearch/blob/029624000b039516f60c14228492e3164bce6c61/x-pack/plugin/core/template-resources/src/main/resources/ecs%40mappings.json#L186-L196
I think those path_match rules are a bit too wide. I'll see if we can narrow them to only match geo.location.
The location rule was narrowed to geo.location in https://github.com/elastic/elasticsearch/pull/108349.
Before switching to apm-data by default, we may want to get https://github.com/elastic/kibana/pull/183250 in, as we found some issues with the service map (and other script aggs) when fields are missing from mappings.
I've been working in confirming index templates and pipelines are aligned between the integration and the plugin.
For ingest pipelines, I redacted a document highlighting the differences. Work is in progress to address comments & align where necessary.
For index template, I generated all final index templates through the _simulate API for both the integration and the plugin and I create a document with differences.
@endorama I see that @axw has left some comments on the doc; what else do you need for moving forward with this?
Other things we need to do:
- [x] https://github.com/elastic/elasticsearch/pull/108860
- [x] https://github.com/elastic/elasticsearch/pull/108885
- [x] https://github.com/elastic/elasticsearch/pull/108862
- [x] https://github.com/elastic/integrations/pull/9949
- [x] perform some manual upgrade testing from 8.14.x to 8.15.0
- [ ] ~update docs to remove requirement for installing the integration (e.g. at https://www.elastic.co/guide/en/observability/current/apm-getting-started-apm-server.html)~ moved to https://github.com/elastic/apm-server/issues/13237
I created https://github.com/elastic/apm-server/issues/13237 to track updating the documentation, so we can track it separately from this task.
To close this what's left is running the benchmark to verify ingestion performances, which I'm running today and will report on later.
I'm closing this issue as I'm tackling the benchmarks to confirm performance parity in https://github.com/elastic/apm-managed-service/issues/692