Fix: apm package needs to be up-to-date when bundled with Kibana
Kibana bundles the most recently published version of the apm package and renames it to be aligned with the Kibana version.
When bumping versions, this can mean that while Kibana is already on 8.8.0-SNAPSHOT, the apm package is still on 8.7.0-SNAPSHOT and is not up to date. This leads to problems when updating testing clusters.
- A new apm package needs to be pushed in time for Kibana to pick it up
- Kibana should not just rename an existing package if it is not on the same version.
Discuss with the Fleet team how to solve this.
I think the biggest issue that arises today is indexing being blocked, due to this:
https://github.com/elastic/apm-server/blob/52b02b7323b0026b08f279c680c59a5a2ba35bfe/apmpackage/cmd/genpackage/pipelines.go#L50-L63
i.e. the 8.8.0-SNAPSHOT build of Kibana has bundled integration package 8.7.0-SNAPSHOT, rewritten the version to 8.8.0-SNAPSHOT, but internally it still has this pipeline that requires APM Server be < 8.8. So an 8.8.0-SNAPSHOT deployment will fail until Kibana bundles a new package version, which leaves a window (sometimes days) where things are broken.
I can think of a few options for addressing this:
- Update the bundling process to rewrite that pipeline as well, at the time the version is rewritten.
- Encode a data schema (semantic) version, which we would manually bump whenever the producer adds, changes, or removes a field. We would then update the pipeline to match.
- We could drop the check altogether
(1) adds a very small amount of risk of indexing issues, in that an 8.8.0-SNAPSHOT could have an 8.7.0-SNAPSHOT package that accepts data from 8.8.0-SNAPSHOT apm-servers. As long as this doesn't persist for longer than a few days, it should be fine.
(2) seems the cleanest, but adds some overhead to the process of updating fields. Maybe that's OK?
(3) seems risky at first blush, but given that we're now using dynamic: strict and dynamic: false in almost all data streams, it might actually be OK; data would be rejected if it doesn't fit the strict mapping. We have only one data stream that has dynamic: true -- metrics-apm.app-*, and we could look at whether this is possible to change. If a newer version of APM Server sends data that is incompatible with the ingest pipeline it could cause an error, but that's no worse than if it were to send data that doesn't fit the strict mapping.
I think (3) would be possible with some changes to Elasticsearch to permit dynamically mapping fields which have an explicitly named dynamic template with dynamic: strict (or some new mode): https://github.com/elastic/elasticsearch/issues/93951
For (3) we could alternatively use the approach described by https://github.com/elastic/elasticsearch/issues/12358#issuecomment-1100325098. It feels a bit dirty relying on an invalid mapping for the rejection, but it does work.
In https://github.com/elastic/apm-server/issues/10808 we moved away from strict mappings, so we are now using dynamic: false and dynamic: runtime, apart from app_metrics where we require dynamic mapping.
If we can ensure that for app_metrics that dynamic mapping is safe, then we can remove the version check. We might do this by using the builtin metrics-mappings and data-streams-mappings component templates. If that proves too difficult because Fleet/package-spec controls the template definition, then we could cherry-pick the relevant parts, such as dynamic templates.
We already get these from the .fleet_globals-1 component template added by Fleet:
"dynamic_templates": [
{
"strings_as_keyword": {
"mapping": {
"ignore_above": 1024,
"type": "keyword"
},
"match_mapping_type": "string"
}
}
],
"date_detection": false
The only other relevant thing in data-streams-mappings is dynamically mapping string "ip" fields as the IP field type. I think we can live without this.
I believe this issue is related to the many failing tests when main is bumped to a new version. With the increasing usage of the apm es synthtrace client, which requires the same apm package version to be installed, it's causing a lot of tests to break during this window where the versions don't match. Would be great to get this prioritized.
@axw has been working on moving the template and ingest pipeline setup from Fleet to ES plugin (https://github.com/elastic/apm-server/issues/11528). The work has well progressed, but is not yet done, we are aiming for getting this done in one of the next minor versions, therefore not prioritizing improvement work with the current Kibana bundling.
Is there any update on this issue? It blocks us from re-enabling certain tests.
We have moved the apm package to the integration repo and removed the version check (https://github.com/elastic/integrations/pull/9185). I'm not aware of anything else blocking from re-enabling the tests.